Friday, 9 October 2009

Dodgy Mathematics Exposed #4: The Difference Between Correlation and Causation

It has been said by one Father Larry Lorenzoni that birthdays are good for you, as those who have more of them tend to live longer. This is an excellent example of today's lesson: the line between correlation and causation, which is becoming so blurred by media outlets as to have become virtually indistinguishable from a sign saying 'Please cross here'.

First, we must explain our terms.

'Correlation' is what happens when two things go together. For instance, it is true that the number of birthdays you have had correlates directly with how old you are (one may have to make a separate category for those born on 29th February, but the principle still holds). It is true that how heavy you are tends to roughly correlate with how tall you are, although admittedly the is getting weaker. It is also probably true that during a flu epidemic the amount of Lemsip sold goes up, while the number of beach umbrellas probably goes down (the latter is called negative correlation).

Two things are 'uncorrelated' if they are unrelated, such as the number of spots on my dalmation (if I had one), and the frequency of buses on the Cregagh Road.

'Causation' is a different thing, and happens when one thing causes another to happen. For example, when people get colds and flus, they buy Lemsip to get rapid relief from their symptoms. I don't though, because I hate Lemsips, but this is not statistically significant. They are also less likely to go to the beach, and so do not need a beach umbrella.

From this you should be able to see that 'correlation' and 'causation' are not at all the same thing, and should not be confused. There are three Important Things to remember here:

1. Correlation may be coincidental

For instance, it has been pointed out that a decrease in piracy worldwide has correlated with an increase in global warming; this does not, of course, indicate that pirates were good for the environment.
Similarly, the rise of reality TV shows roughly corresponded with my own progression through university, but I think we can safely assume that they did not help me in any way and that I would probably have gone through university regardless.

2. Correlation may arise from a root cause
During a flu epidemic, we have seen that Lemsip sales will increase; it is also likely that more people will take time off work. This does not, however, imply that buying Lemsip makes you take a day off, or that having a sneaky day off makes us all so gleeful that we rush out to Boots and crack open the cold rememdies. It is merely that they are both caused by the same thing.
A similar effect happens when you give teenagers injections, and this leads to much suspicion about vaccines such as the cervical Cancer one. If you inject a teenage girl with anything at all, including water, or indeed nothing - if, in fact, you stick a needle in the arm of a teenage girl, you will find that very often you don't hear the end of it for days. This is not the same thing as a side effect of whatever you injected. Therefore it is not correct to assume that a rise in headaches in the days after a vaccination programme implies that the vaccination causes headaches; in reality, it may be that sticking a needle in the arm caused an excuse for a bit of drama in the mind of a 14-year old.

3. Causation may not work in the expected direction
This is where the birthday thing comes in, because Father Larry has extrapolated in the wrong direction: it is not so much that having more birthdays makes you live longer, but that living longer makes you have more birtdays. This phenomenon, which we will call 'Causation Direction Reversal' is a good source of potential humour.

I trust that everyone now understands the difference between correlation and causation, and that you will now be better equipped to make up jokes.

jayber crow said...

I laughed a lot while reading this post. I believe my laughter may have cause you to be funny...