Monday 21 September 2015

Quantitative Analysis 3 - Correlation does not Causation


Let’s first discuss the idea of correlation and its application in the data analysis process. Correlation analysis is being used to identify or quantify the relationship between two quantitative variable. The variables shall be between either dependent or independent variable. Correlation Coefficient r is used to measure between the response or predictor variable. The sign of the correlation coefficient indicates the direction of the association. The direction shall be either positive association or negative association. For example, a correlation of r = 0.95 indicates a strong, positive association between the two variables. However, if a correlation r = -0.3 indicates a weak, negative association between the two variables. The magnitude of the correlation coefficient indicates the strength of the association. In correlation analysis, we can come across only four scenarios of association.


 
Scenario 1 - The two variables have a strong positive correlation where r = 0.9
Scenario 2 - The two variables have a weak correlation where r = 0.3
Scenario 3 - The two variables does not have any correlation where r = 0
Scenario 4 - The two variables have a strong negative correlation where r = -0.9

Use Case Correlation:

Marketing manager wants to identify the critical variable that is affecting the Conversion Rate of a website.

Business managers want to find out whether the blog update related to free release of online games causing the additional sale of revenue on the given day. 

Day
Visitors - Free Online Games Release Update
Revenue
1
18000
1500
2
12000
1200
3
15000
1600
4
10000
900
5
8000
950
6
14000
1300
7
12000
1100
8
16000
1650
9
10000
1050
10
20000
1600

You can use excel function CORREL () to identify the correlation coefficient to measure the relationship between the visitors and the revenue. The correlation coefficient r for the above dataset is 0.90. It indicates that there is strong relationship between the variable visitors and revenue. Another perfect example for a strong negative relationship is that whenever the rainfall decreases, the output of the agriculture decrease. The correlation analysis also helps to further extend the analysis in a multivariate statistics.

Correlation does not imply Causation:

When you try to find the correlation between two independent variable or between a dependent variable and independent variable. Correlation does not imply Causation means that events that happen to coincide with each other are not necessarily causally related. This shall be conveyed that the variable X does not have effect on the variable Y. It’s just a coincidence. We have to further validate or hypothesis that X is causing the effect on the Y variable. On the above use case, we found that the correlation coefficient was at 0.89. It only indicates that there is strong relationship between our Y Variable revenue and the X variable visitors. However, we do not have any proof that if there is increase in the visitors than the revenue is also increases. No cause and effect is implied.

No comments:

Post a Comment