Importance of Relationships between Variables
When analyzing data, it is extremely important to know the real relation that exists between the different variables studied.
If you have to make an important Business decision, like defining your Marketing strategy, for example, you can’t rely just on the first plausible relation you may find between your sales and the weather
- That is more Superstition than other thing.
Moreover… If you think carefully, all the important decisions we make, are based on a Relation between events that happened in the Past.
- A Boss fires an employee, since he found a correlation between lazy employees and low productivity.
- A Girlfriend dumps her boyfriend if he is always going out with his friends to night clubs since she has found an inverse correlation between loyalty and going out with friends until 05.00 am.
Throughout our lives, we assess variables, establish Relationships between them and take decisions based on the result.
The Stronger the relation found the few mistakes we’ll commit when making assumptions.
¿How can we classify the Relations between Variables depending on How strong these Relations are?
We’ll now propose a classification we have created that is very useful for identifying the relationship between 2 (or more) variables:
Correlation and Causation
As you can appreciate, the weakest relationship you can find between 2 (or more) different variables is:
- An Inexplicably Correlation.
Example of Correlation
Imagine that you found a correlation between global political stability and coffee consumption
- We have invented this, of course.
Nobody could exactly explain it, but if you proved a correlation, you could anticipate the coffee Market.
However, you should not assume big risks since you don’t really understand why this relationship takes place.
- Maybe you are missing something important…
On the other hand, the strongest relationship you can find between 2 (or more) different variables is:
- A Demonstrable Causation.
Example of Causation
Now, imagine that you found a higher consumption of coffee during “exams” periods.
- In Christmas time and just before Summer holidays.
You would have a provable and strong cause-effect relationship between “coffee consumption” and “year time period”.
In the middle, you can find the Correlations that can be explained.
- The ones that “make sense”.
Correlations that are based on Logic rather than on Solid data.
* In our “Data Analysis” page, we explained an interesting correlation Walmart found between Hurricane alert, and Pop Tarts.
- That would be a “logic” correlation, since after a careful analysis, it makes sense.
Difference between Correlation and Causation
We have been talking about “Correlations” and “Causations” but… What exactly are they?
What is the difference between a Correlation and Causation?
Correlation: It indicates that whenever a variable experience a variation the other also gets affected to some point.
- Despite not being directly related.
- The Strongest the Correlation the more predictable the outcome will be.
Causation:It means that always that one variable gets affected, the other will be modified since the first one causes it.
- There is a Direct Relation between both Variables.
- The Outcome can be perfectly Predicted.
Since this can be sometimes confusing, we’re going to share with you some examples.
- We’ll classify the relationship between the different variables.
You’ll understand it much better.
Correlation vs Causation Examples
Since this Topic can be a bit “boring”, we’ll try to use “funny” examples.
- Using Google Trends.
Example of Correlation
Imagine you work for an Ice Cream company.
- And you are currently looking for optimizing your Marketing campaign.
Hence, you decide to look for different Correlations or Causations between Ice cream consumption and other potential indicators.
- So you can “beat” the Market.
After several frustrated attempts, you find a fascinating correlation:
- The interest in Sharks and Ice Creams seem to be related.
What is going on?
Of course, these 2 variables are connected through the “weather” connector one.
- The hotter the weather is, the more people will be swimming on the beach and the more concerned will they be about Sharks.
Logically, on summer, people consume more Ice Creams.
Although the relationship between these 2 variables seems not to be very useful, it can be perfectly explained and this correlation is relatively strong (although, again, useless).
Moreover, for our example, this correlation would not be very useful since the interest in Ice creams seems to come before the interest in Sharks.
This is an Explainable correlation, since although both variables / events are not directly connected, there is a provable and understandable relation between both of them.
Car breakdowns and Global Economy - Causation example
It was 2010 and I was driving on a Highway.
- I couldn’t believe how many cars there were on the shoulder.
What was going on? Why were so many cars broken down?
2010 was a very tough year regarding Global Economy.
Lots of families were in a very difficult financial situation.
- And one of the first things people stop paying was their car periodic inspection.
Why is this causation and not correlation?
Because the worse the global economic situation is, the more car breakdowns there will be.
- One variable causes the other directly, even if there are several steps in the middle.
A bad global economic situation causes more people to lose their job.
That causes that they to stop paying certain non-vital expenses.
- And includes car periodic inspections, what causes the cars to break down at higher rates.
Hence, this is a Causation relationship, since one variable causes the other to happen.
Chips and Coke - Inexplicable good correlation
This is a very strange yet good correlation.
People consuming sugar-based beverages; tend to consume more chips (Potatoes, Doritos…).
Maybe, they consume these beverages because they are “addicted” to sugar and hence, they “need” a salted flavor in order to balance the taste.
Maybe, they tend to care less about their health and then they have no problem about consuming “junk” food.
There is not a strong logic behind this relation such as we established in the car breakdown example previously explained.
But… which is the correlation?
- The interest or consumption of Chips is correlated with that about sweet beverages (mainly, Coca Cola and Pepsi).
In this Worldwide Google Trends search, we compared the interest in:
- Coke price.
- Chips price.
- Chocolate price.
We included Chocolate price as our “control” or reference since it is also something you could consume together with chips, since it is also sweet, as a Coke.
As you can appreciate, as the interest in Coke and Chips behaves very similarly, the overall interest evolution in “Chocolate price” is very different.
We would classify these 2 variables (Coke and Chips consumption/ interest) as follow.
This is an Inexplicable Correlation since, there is not a strong logic behind that explain why these 2 consumption habits are commonly related.
This is just an example, of course. This is not the most accurate consumption study you may find. We are aware.
- Maybe, the marketing campaigns developed during last decades made us associate Chips and Sweet beverages…
We just wanted to give you an example about correlations, preferences… that may initially seem very logic but really it is not.
* However, Pepsico realized about how close these 2 variables/ preferences / products, are and that is why they bought Lays several years ago.
Why are Correlations important?
You may be thinking: “How can all of this be useful?”
If you think about it, we all tend to take important Business decisions, based on certain tendencies or “correlations” between variables that we have experienced.
- Consuming behaviors.
- Relationships between different consumers’ preferences.
- Geographic tendencies.
What makes us taking a decision is a Correlation or a Causation.
It is very important to classify and analyze properly the events that pushed us towards taking a decision.
- The stronger the relationship between factors the more predictable the outcome will be.
- The more predictable the result, the lower the risk you’ll assume.
- And the lower the risk you assume, the more you can invest in it.
Whenever you make an important decision, you are taking it based on a certain correlation between events or variables.
It is important to assess whether that variables have a strong relationship, or a weak one.
In order to classify the relationship between different variables in an easy and useful way, you can classify them in three categories:
- Inexplicable Correlations.
- Explainable Correlations.
The stronger the relationship is, the more predictable the outcome will be.