Below: there is an illustration of are two games, left and right, and we must choose to play one of them 100 times. After each game we restore the bins to their original types and quantities. The object is to end up with the item T with the greatest degree of accuracy possible, which means that we also want to mitigate the chance of accidentally including F. Which game would we choose?
In the games, there are bins to draw items from or bins to put items into. These are represented by the rectangles with rounded corners. The contents in each bin are one or more of the following. There are True items, “T,” which have the characteristic, “C.” There are False items, “F,” which also have the same characteristic, “C.” Finally, there are items, “N” which do not have the characteristic “C” (and so by testing for “C” these “N” can be filtered out of the results).
For example, on the left, we draw from a bin of 100 items at random. T would be selected, on average, about 2 in every 100 tries, but we would also mistakenly draw F, on average, two times. We end with two of T, but we also confused two of the F with T, yielding a 50% error rate. On the other hand, in the game on the right, the error rate is only 2%. What makes the difference between these two sorting systems?
In the diagram on the right, we begin with a toss of a coin. If the coin lands heads, then we draw from the left bin; if tails, then we draw from the right bin. Thus, there is a 50% chance of drawing from the first bin, which gives us T. (To bring this back to FH, this represents the fact that 50% of those who are first degree relatives of verified mutation carriers will have the mutation.) On the other hand, 50% of the coin tosses will land tails, where we then randomly draw from the bin full of 100 items and where at this point our chance of drawing F is only 2%. Drawing from these first two bins 100 times according to the flip of a coin, we end with 51 T to 1 F on average, but diluted by the presence of N. This dilution with N, in both games, is resolved in an exclusionary step: we see that there is a characteristic which distinguishes T and F from N. That is C. After we remove all those with characteristic C, we end with a more concentrated pool. The game on the right is more accurate than the one on the left because we can get T with a flip of a coin. (As for FH, this is the same as finding a verified mutation carrier and then beginning with a first degree relative. Because this relative has a 50-50 chance of carrying the mutation, we begin with a superior “prior probability,” and we’re successful 50% of the time with the first step alone.)
What makes the difference between these two sorting systems? On the right, the odds of T is 1 in 2. All things being equal, beginning with a strategy that employs base rate is vastly superior to a strategy which employs base rate fallacy – unless you seek to profit from the error. (Of course, to perfectly replicate a genetic inheritance scheme, I would need to add in the fact that there are two parents, each with two alleles, and so on …. However, we are not making a point about genetics here, but mathematics, so for the most part, we’ve stripped down the genetic backdrop in order to isolate a single mathematical problem at center stage: Base Rate Fallacy. I invite the reader to add elements representing a more complete diagram of inheritance and then make the relevant adjustments to the probabilities. Although some of the numbers will change, the greater contribution to the difference between results will lie with base rate.)