# Why you'd rather be diagnosed with a rare disease than a common one

Back in the day when I studied economics, there were quite a few assumptions underlying economic theory, especially micro economics, that always bugged me. Besides the pretty obviously wrong assumption of “homo oeconomicus”, i.e. that individuals are rational actors, a key feature of efficient markets is complete information, something even stock markets (arguably the most efficient markets we managed to create) do not have. Economic activity involves making decisions under uncertainty – we need to deal with randomness and probabilities, and understand them.

In the real world, probabilities don’t follow the simple text book rules that most people with a secondary education have heard of, mostly explained by invoking games of chance. Especially for those with elements of skill such as poker or backgammon, players need a solid understanding of the probabilities involved, and they know them either by heart or can calculate them on the spot.

Short term, you can get lucky, but long term you will win more games of Yahtzee by going for the bonus and striking the Yahtzee than the other way round.

As I alluded to on my earlier post on the Monty Hall Problem, games of chance follow “frequentist” statistics. They are clearly bounded (52 cards, 6-sided dice etc.), you can repeat the experiment, i.e. the draw or roll, as often as you like, and every event is the same so the probabilities are objectively clear.

For real world risk assessment, a frequentist approach needs to assume the average case. That's ok for insurers that have the Law of Large Numbers on their side, but you and I aren’t average. “Bayesian” statistics, in contrast, work with individual and subjective probabilities, based on prior knowledge about the risk or event. Probability is always based on subjective belief, and when your knowledge about the risk changes, so should your belief in its probability.

Unfortunately, Bayesian inference is not very intuitive for most people. Here’s a simple riddle:

Let’s assume there is a disease, Neurological Unilateral Temporal Syndrome (NUTS) with an incidence rate of 1%, i.e. a fairly common disease. There is a test that detects it long before you show any symptoms. The test is pretty reliable – it detects 99% of cases, and shows a false positive in 1% of cases. That means both false positives and false negatives are at 1%.

Let’s further assume you read about NUTS, and you go get the test. It comes back positive. What is the probability that you actually are afflicted with NUTS?

The reflexive frequentist-inspired answer would be 99%, and it would be wrong. Let me explain.

Think of the town of Testville, inhabitants 10,000. Let’s test all of them. At 1% incidence rate, exactly 100 people will have NUTS, and 9,900 won’t. Of the 100 afflicted, 99 will test positive, 1 will be a false negative. But of the 9,900 non-afflicted, 9,801 will correctly test negative, while the last 99 will be a false positive. All in all, Testville has 198 positives, 99 of them real, the other 99 false.

For you with your positive test, that means you have a 50% likelihood of actually having NUTS.

How does this change when we tweak the rates? Let’s look at a second disease, Wildly Asymmetrical Rapid Transfixion Syndrome (WARTS). WARTS is quite rare, afflicting only 0.01% of the population. The test for WARTS has the same reliability as the one for NUTS. Surely, when you test positive now you are more likely to have it, right?

Nope, despite your positive test your are only 0.98% likely to have WARTS. We’ll increase Testville to a size of 1 million inhabitants, again 100 of which have WARTS, 99 of which correctly test positive. But now we have 999,900 healthy individuals, so the 1% false positive rate weighs much heavier.

So far, this was fun but theoretical. But it has some real practical uses for risk assessment and risk mitigation strategies which even non-statisticians can apply. Consider breast cancer in women. Should you get a mammogram?

Let’s plug the numbers into the spreadsheet (you didn’t think I did this in my head, did you?). The age adjusted incidence rate both for the US and for Germany is 0.085%, i.e. out of 100,000 women an average of 85 get breast cancer. Mammograms are unfortunately much more inaccurate than our NUTS and WARTS tests. They only detect around 80% of actual cases (i.e. false negative rate is 20%), and false positive rate is quoted between 7 and 12%. If I use the lower end, 7%, that means after a positive mammogram there is still less than 1% likelihood of actually having breast cancer.

Of course, the incidence rate I used is an average one. When making the decision, you need to adjust for age, family history and other additional risk factors to get the individual usefulness of a mammogram. This will never be exact, nor does it need to be – we aren’t homo oeconomicus, after all, nor are we “playing enough hands” for the exact odds to make a difference. We only need to be exact enough to weigh against the subjective cost – the monetary and physical cost of the mammogram, the emotional cost of the highly likely false positive, etc – and the alternative risk containment measures. So the question “should you get a mammogram” has no objectively correct answer. My wife has her own, and it’s the right one for her – as is should be.

To wrap up the frequentist vs. Bayesian discussion: