The importance of relationships between variables
When analyzing data, it is extremely important to know the real relation that exists between the different variables studied.
If you have to make an important Business decision, like defining your Marketing strategy, for example, you can’t rely just on the first plausible relation you may find between your sales and the weather, even if you think you found a strong relation between both variables.
Moreover… If you think carefully, all the important decisions we make, are based on a correlation or relationship between events we have noticed in the past.
- A Boss will fire an employee, since he found a correlation between lazy employees and low productivity.
- A girlfriend will dump his boyfriend if he is always going out with his friends to night clubs since she has found an inverse correlation between loyalty and going out with friend until 05.00 am.
Throughout our lives, we assess variables, establish relationships between them and take decisions based on the result.
The stronger the relation found the few mistakes we’ll commit when making that assumptions.
How can we classify our variables’ relationships depending on how strong they are?
We’ll now propose a classification we have created that is very useful for identifying easily the relationship between 2 (or more) different variables:
Variables relationship strength classification
As you can appreciate, the weakest relationship you may find between 2 (or more) different variables is:
- An Inexplicably Correlation.
* For example: If you found a correlation between global political stability and coffee consumption (invented example).
Nobody could exactly explain it, but if you proved a correlation, you could anticipate the coffee Market sometimes.
However, you should not assume big risks since you don’t really understand why this relationship takes place… maybe you are missing something important…
On the other hand, the strongest relationship you can find between 2 (or more) different variables is:
- A demonstrable Causation between both of them.
* For example: If you found a higher consumption of coffee during “exams” periods – Christmas time and just before Summer holidays – you would have a provable and strong cause-effect relationship between “coffee consumption” and “year time period” variables.
In the middle, you can find the Correlations that can be explained. The ones that “make sense”.
In our “Data Analysis” page, we explained a very interesting correlation Walmart found between Hurricane alert, and Pop Tarts.
- That would be a “logic” correlation, since after a careful analysis, it makes sense
Difference between Correlation and Causation
We have been talking about “Correlations” and “Causations” but… what exactly are they? What is the different between a Correlation and a Causation?
A Correlation between 2 different variables (or more) indicates that whenever a variable experience a variation (increase, decrease…) the other also gets affected to some point despite not being directly related.
A Causation relation between 2 variables means that always that one variable gets affected; the other will be modified since the first one causes it.
Since this can be sometimes confusing, we’re going to share with you some examples.
- Moreover, we’ll classify the relationship between the different variables according to their bond-strength by using the classification we explained before.
Correlation vs Causation Examples
Since this Topic can result a bit “boring”, we’ll try to use “funny” examples in order you to understand the difference between Correlation and a Causation.
Sharks and Ice creams - Correlation example
Imagine you work for an Ice Cream company, and you are currently looking for optimizing your Marketing campaign.
Hence, you decide to look for different correlations or causations between Ice cream consumption and other potential indicators so you can “beat” the Market.
After several frustrated attempts, you find a fascinating correlation:
- The interest in Sharks and Ice Creams seem to be related.
What is going on?
Of course, these 2 variables are connected through the “weather” connector one.
- The hotter the weather is, the more people will be swimming on the beach and the more concerned will they be about Sharks.
Logically, on summer, people consume more Ice Creams.
Although the relationship between these 2 variables seems not to be very useful, it can be perfectly explained and this correlation is relatively strong (although, again, useless).
- Moreover, for our example, this correlation would not be very useful since the interest in Ice creams seems to come before the interest in Sharks.
This is an Explainable correlation, since although both variables / events are not directly connected, there is a provable and understandable relation between both of them.
Car breakdowns and Global Economy - Causation example
It was 2010.
I was driving on a Highway and I couldn’t believe how many cars there were on the shoulder.
What was going on? Why were so many cars broken down?
2010 was a very tough year regarding Global Economy.
Lots of families were in a very difficult financial situation and one of the first things people stop paying was their car periodic inspection.
Why is this causation and not correlation?
Because the worse the global economic situation is, the more car breakdowns there will be.
- One variable causes the other directly, even if there are several steps in the middle.
A bad global economic situation causes more people to lose their job, what causes they to stop paying certain non-vital expenses, what includes car periodic inspections, what causes the cars to break down at higher rates.
Hence, this is a Causation relationship, since one variable/ event will cause the other to happen.
Chips and Coke - Inexplicable good correlation
This is a very strange yet good correlation.
People consuming sugar-based beverages; tend to consume more chips (Potatoes, Doritos…).
- Maybe, they consume these beverages because they are “addicted” to sugar and hence, they “need” a salted flavor in order to balance the taste.
- Maybe, they tend to care less about their health and then they have no problem about consuming “junk” food.
There is not a strong logic behind this relation such as we established in the car breakdown example previously explained.
But… which is the correlation?
- The interest or consumption of Chips is correlated with that about sweet beverages (mainly, Coca Cola and Pepsi).
In this Worldwide Google Trends search, we compared the interest in:
- Coke price.
- Chips price.
- Chocolate price.
We included Chocolate price as our “control” or reference since it is also something you could consume together with chips, since it is also sweet, as a Coke.
- As you can appreciate, as the interest in Coke and Chips behaves very similarly, the overall interest evolution in “Chocolate price” is very different.
We would classify these 2 variables (Coke and Chips consumption/ interest) as follow:
This is an Inexplicable Correlation since, there is not a strong logic behind that explain why these 2 consumption habits are commonly related.
This is just an example, of course. This is not the most accurate consumption study you may find. We are aware.
- Maybe, the marketing campaigns developed during last decades made us associate Chips and Sweet beverages…
We just wanted to give you an example about correlations, preferences… that may initially seem very logic but really it is not.
* However, Pepsico realized about how close these 2 variables/ preferences / products, are and that is why they bought Lays several years ago.
Why are Correlations important?
You may be thinking: “How can all of this be useful?”
If you think about it, we all tend to take important Business decisions, based on certain tendencies or “correlations” between variables we have noticed.
- Consuming behaviors.
- Relationships between different consumers’ preferences.
- Geographic tendencies.
What makes us taking a decision is a correlation or a causation we have noticed.
It is very important to classify and analyze properly the events that pushed us towards taking a decision.
- The stronger the relationship between variables/ events the more predictable and solid the outcome will be.
- The more predictable the result, the lower the risk you’ll assume.
- And the lower the risk you assume, the more you can invest in it.
Whenever you make an important decision, you are taking it based on a certain correlation between events or variables.
It is important to assess whether that variables have a strong relationship, or a weak one.
In order to classify the relationship between different variables in an easy and useful way, you can classify them in three categories:
- Inexplicable Correlations.
- Explainable Correlations.
The stronger the relationship is, the more predictable the outcome will be.