Saturday, November 13, 2004

Improbably True

Recently the Wizard of Odds tried to explain a poker problem with a curious tidbit. Suppose you know two women, A and B, who each have exactly two children. Woman A says "I have at least one son," while Woman B says "I have a child named Jacob." It turns out that there is about a 67% chance that Woman A has a son and a daughter, and a 33% chance that she has two sons. Conversely, there is a 50% chance that Woman B has a son and a daughter, and a 50% chance that she has two sons. How could this be? It seems like "I have a child named Jacob" gives you the same information as "I have at least one son." The wizard tries to explain that the distinction somehow relies on the words "at least," but this doesn't seem especially satisfactory.

I've spent some time puzzling through this result, and I think I can explain it a little more clearly. First, you have to understand how it is that Woman A has a 67% chance of having a son and a daughter. If we ignore details such as identical twins and the fact that there are about 106 boys born for every 100 girls, and the odd case of transgendered children, we find that there are four equally likely combinations of children for woman A:

1. Two girls
2. An older girl and a younger boy
3. An older boy and a younger girl
4. Two boys

Three of these four possibilities are allowed by the statement "I have at least one son." In two of these three equally likely cases Woman A also has a daughter, leaving a 67% chance that Woman A has a son and a daughter.

This is actually misstating things a little bit. There is 100% chance that any real Woman A has exactly the children that Woman A has. More precisely, we are stating that if you had a geneological database of sufficiently large size, and selected from that database all women with exactly two children, and then from that list, selected only those who could truthfully say "I have at least one son," you would find that about 2/3 of those women thus selected would have one son and one daughter.

Let's look at Woman B's case, now. We think that the statement "I have a child named Jacob" is equivalent to "at least one of my children is a boy." We are, of course, assuming that the number of girls named Jacob is vanishingly small. The curious thing is that if we went through our same database and selected all the women who have exactly two children, and from those, the number who have a son named Jacob, we will find that about half of them have one son and one daughter, and about half of them have two sons. The reason for this is that each boy has roughly an equal probability of being named Jacob. We ignore the fact that it is unlikely that a woman would name two sons Jacob, although this does slightly skew the results.

Basically, a woman with two sons is twice as likely to have a son named Jacob as a woman with only one son. So, if we start with our set of possible candidates for Woman A, in which 2/3 of the entries have a son and a daughter, and randomly assign names to all the sons in that set, the women in that set who have two sons are twice as likely to have a son named Jacob as the women with only one son. So, if the probability that a boy is named Jacob is X, we have 2X*1/3 as the probability that Woman B has 2 sons, and 1X*2/3 that Woman B has a son and a daughter. These values are both X*2/3, meaning that they are approximately equal. 50% of the set of candidate Woman Bs have two sons.

I hope this clears up some of the confusion.

0 Comments:

Post a Comment

<< Home