Let’s first discuss the idea of correlation and its
application in the data analysis process. Correlation analysis is being used to
identify or quantify the relationship between two quantitative variable. The
variables shall be between either dependent or independent variable.
Correlation Coefficient r is used to measure between the response or predictor
variable. The sign of the correlation coefficient indicates the direction of
the association. The direction shall be either positive association or negative
association. For example, a correlation of r = 0.95 indicates a strong,
positive association between the two variables. However, if a correlation r =
-0.3 indicates a weak, negative association between the two variables. The
magnitude of the correlation coefficient indicates the strength of the
association. In correlation analysis, we can come across only four scenarios of
association.
Scenario 1 - The two variables have a strong positive
correlation where r = 0.9
Scenario 2 - The two variables have a weak correlation
where r = 0.3
Scenario 3 - The two variables does not have any
correlation where r = 0
Scenario 4 - The two variables have a strong negative
correlation where r = -0.9
Use Case Correlation:
Marketing manager wants to identify the critical variable
that is affecting the Conversion Rate of a website.
Business managers want to find out whether the blog
update related to free release of online games causing the additional sale of
revenue on the given day.
Day
|
Visitors - Free Online Games Release Update
|
Revenue
|
1
|
18000
|
1500
|
2
|
12000
|
1200
|
3
|
15000
|
1600
|
4
|
10000
|
900
|
5
|
8000
|
950
|
6
|
14000
|
1300
|
7
|
12000
|
1100
|
8
|
16000
|
1650
|
9
|
10000
|
1050
|
10
|
20000
|
1600
|
You can use excel function CORREL () to identify the
correlation coefficient to measure the relationship between the visitors and
the revenue. The correlation coefficient r for the above dataset is 0.90. It
indicates that there is strong relationship between the variable visitors and
revenue. Another perfect example for a strong negative relationship is that
whenever the rainfall decreases, the output of the agriculture decrease. The
correlation analysis also helps to further extend the analysis in a
multivariate statistics.
Correlation does not imply Causation:
When you try to find the correlation between two
independent variable or between a dependent variable and independent variable.
Correlation does not imply Causation means that events that happen to coincide
with each other are not necessarily causally related. This shall be conveyed
that the variable X does not have effect on the variable Y. It’s just a
coincidence. We have to further validate or hypothesis that X is causing the
effect on the Y variable. On the above use case, we found that the correlation
coefficient was at 0.89. It only indicates that there is strong relationship
between our Y Variable revenue and the X variable visitors. However, we do not
have any proof that if there is increase in the visitors than the revenue is
also increases. No cause and effect is implied.
No comments:
Post a Comment