Confusing Causation and Correlation
Failing to understand the difference between correlation and causation, and how those concepts should be applied, is a major problem that often leads companies to make major strategic mistakes.
Here is a classic example of the difference in correlation and causation. Let’s say you plan to write a white paper on public safety. You conduct research and determine the more firefighters that fight a fire, the more damage the fire causes.
You are surprised. That cannot possibly make sense. So, you research other towns and get the same results. You conduct sufficient research to ensure your results are statistically valid. Even accounting for an appropriate margin of error, clearly more firefighters equals more fire damage.
If you take your results at face value you might decide that dramatically reducing the number of firefighters will decrease the amount of damage caused by fires. That decision would also be a huge mistake…maybe even catastrophic.
Why? Because you mistook correlation for causation.
In theory, causation and correlation are easy to distinguish. Correlation exists when there is a relationship between two variables (events or actions). Causation exists when one event is the direct result of another event.
The number of firefighters fighting a fire does indeed correlate with the amount of damage caused by the fire. The reason is simple. The bigger the fire, the more firefighters needed to fight the fire. And the bigger the fire, the more damage caused by the fire. Originally we correlated two data points – more damage and more firefighters – when in fact both were caused by a third element, bigger fires.
Here’s another example. Towns with higher ice cream sales have higher drowning rates. Clearly ice cream does not cause drowning. The two facts are correlated but not causal. The causal factor is weather. In towns with higher temperatures, more ice cream is sold, more people swim… and more people drown. A town that attempts to prevent drowning by restricting ice cream sales will obviously miss the mark.
At ForeSee we see businesses miss the mark all the time because they fail to differentiate between causation and correlation.
Say you sell clothing offline and online. One of the shirts you sell is available in a variety of colors, made with impeccable qualities. In an offline environment shoppers can easily see the different colors, see the hand-stitching, and view it from all angles. Online is a different story. So, you add an application to your site allowing visitors to change the color of the image of the shirt; zoom in on the details right down to the detailed stitching and fabric texture; and rotate the shirt to see the sides and back the way potential customers would view it in your store.
You notice after implementing the application sales and conversion rates go up. If you made no other changes to your site, ran no promotions, and did not market differently, you can safely determine your change was a direct – and positive – causal factor.
However, what if you made that change but also ran a special on the shirts at the same time and marketed the promotion through traditional media, online, and social media?
How do you know what worked and what didn’t? Was it the change to the online experience? Was it the advertising? Was it the social media campaign? Or, was it something completely unrelated?
To generate truly meaningful data you must know a lot more about your customers beyond the fact that they came to your site and either bought a shirt or didn’t. You need to know where they came from, why they visited your site, and what they did while there – before you can determine which actions correlate and which are truly causal factors.
With a precise scientific measurement technology you can dig deeper and determine what did and, more importantly, didn’t work, allowing you to apply your resources to the areas that are boosting your bottom line.