Tuesday, February 27, 2007


OK, so tonight was a slow night... (for once!).
I decided to check on how many people in the United States have the same name as I....
See the results for yourself....

NO PEOPLE??!?!? NONE???!?!?

Now I know that my name is not exactly the most common out there, but I find it hard to believe that no other exists (forget the fact for a moment that I myself don't exist).
I was going to write to these folks and tell them that they are wrong, wrong, wrong... but apparently they KNOW THAT already.
Read their page on "accuracy" -


Q: How accurate is this program?

A: More accurate than a Magic 8-ball. Less accurate than distributing and collecting 300 million surveys.

Q: No, really. How accurate?

A: Well, it's hard to say. In order to determine how accurate this program is, we would need a program that was completely accurate for comparison purposes. If we had a program that was completely accurate, we'd use that program instead of this one. At that point, discovering how accurate this program is would no longer be worth the effort. Therefore, we can fairly confidently say that it is impossible to determine how accurate this program is. (Confused? We're just warming up.)

In our completely non-expert opinion, we say that the program gives a decent ballpark estimate, but it shouldn't be used for anything more than that.

Q: Why isn't it more accurate?

A: There are a number of possible sources of inaccuracy:

First and foremost, the program is based upon a convenient fiction. Without getting too technical, the program makes the assumption that a person's first and last names are independent of one another. What this means is it assumes that the probability of a person having a particular first name is the same no matter what last name they have. It isn't.

So, for example: The program assumes that the chance that your first name is "Juan" is the same, regardless of whether your last name is "Arteaga" or "Epstein". Episodes of Welcome Back Kotter aside, we would hazard a guess that there are not that many people in the U.S. actually named "Juan Epstein". Depending upon what your family name is, it makes certain first names more likely, and certain others less likely. The program cannot compensate for that.

Second, the data is old. The data for this program comes from the U.S. Census Bureau's 1990 census. That makes the data about 17 years old or so. This is the most recent name data available from the Census Bureau (the 2000 census did not include name data), but it's still old, and it's accuracy may be slightly questionable.

Third, the data appears biased towards more formal versions of names. The data comes from forms mailed to the Census Bureau. It appears most people put their full, formal version of their name on the forms rather than a nickname. So, for example people who normally call themselves "Bill" would likely tend to put the name "William" on an official Census form. In fact, the data shows the name "William" outnumbering the name "Bill" 20 to 1. So, it appears that nicknames are under-represented in the statistics, and full formal names are over-represented in the statistics.

Fourth, not every name is on the list. A certain number of instances of a name were required to even make the list. About 10% of all responses were not included on the list because they appeared too few times. So, uncommon names are not represented on the list.

Fifth, we failed to make the required blood sacrifices to the gods of programming and statistics. Surely they will plague our endeavor with errors and inaccuracies.

Q: Can I use data from this site in my report / project / masters thesis?

Sure. However, we take no responsibility for any merciless mocking from your teachers and/or peers for using questionable data.

I'm soooo glad... there are days that I feel invisible enough without this!

YW said...

At least you have the 27562nd most popular last name in the U.S.