Faulty correlation

Question

I'm new to python and pandas/matplotlib. I'm trying to calculate the correlation between two closing stock prices of Disney and Netflix (as an example), but not sure if I've done it correctly? When I output my data as seen in the picture below, it looks weird and not as I expected (since I expected it to be one row showing the correlation between the two stocks).

What is the best/easiest way to calculate the correlation between the two closing stock prices, and how to make the output look better? Any tip or help is appreciated!

Please provide example of your dataframe. If you dataframe consist only closing prices, you done it correctly — Roim, Sep 30 '20 at 07:38

Grayrigel · Accepted Answer · 2020-09-30T09:34:43.920

If you just want just the correlation between two columns, you can use buit-in pearsonr module in scipy, which returns Pearson correlation and the p-value.

Try this:

#input test data

>>> newData
        DIS      NFLX
0  0.620575  0.122005
1  0.124085  0.380087
2  0.286652  0.218533
3  0.569696  0.511214
4  0.081106  0.114614
5  0.223516  0.677468
6  0.226528  0.474243
7  0.998798  0.099523
8  0.994585  0.429352
9  0.277520  0.882989

>>> from scipy import stats
>>> corr, p_value = stats.pearsonr(newData['DIS'].values, newData['NFLX'].values)
>>> print(corr)
-0.25752281938162824

It is not returning anything faulty. df.corr() returns a square correlation matrix, which is very useful if have multiple features/variable. You can always extract the correlation between df['DIS'] and df['NFLX'] through loc and iloc:

>>> #test data
>>> newData.corr()
           DIS      NFLX
DIS   1.000000 -0.257523
NFLX -0.257523  1.000000 

>>> newData.corr().loc['DIS','NFLX']
-0.25752281938162824

>>> newData.corr().loc['NFLX','DIS']
-0.25752281938162824

>>> newData.corr().iloc[1][0] # 2nd row and 1st column 
-0.25752281938162824

>>>newData.corr().loc[0][1] # 1nd row and 2nd column 
-0.25752281938162824

You can make your correlation matrix look better instantly by using pandas style:

newData.corr().style.background_gradient(cmap='viridis')

If you want to make correlation matrix look even better. You can use seaborn's heatmap functionality called sns.heatmap. Here is an example:

import matplotlib.pyplot as plt
import seaborn as sns

sns.heatmap(newData.corr(),annot=True, lw=2, cmap='coolwarm')
plt.show()

Output:

I have an answer let me know if it works for you. If it does please checkmark and upvote the answer. — Grayrigel, Sep 30 '20 at 08:06
Just to answer you comment below: `newData.corr()` seems to be a multilevel dataframes. Then, I can think drop extra symbol with `newData.droplevel(0, axis=0) ` and newData.columns = ['DIS','NFLX']. [!check out this answer here](https://stackoverflow.com/questions/22233488/pandas-drop-a-level-from-a-multi-level-column-index). Also, yes 0.02 is a very weak correlation or no correlation. Usually, 0.50 or higher to be a strong correlation. — Grayrigel, Sep 30 '20 at 09:12

score 1 · Answer 2 · answered Sep 30 '20 at 07:45

1

no, your answer is right this is called correlation matrix what you understand from it

the diagonal part wich equal one is always one because the correlation value for the same feature is one
the correlation between the two stock is equal to 0.0272
if you have a third feature for example it will produce a 3*3 matrix for each of them

side note: a good way to presenting the correlation matrix is by using a heat map it's easy to understand and visualize you can check this question which has a good answer that helps to understand how to construct it Correlation heatmap

answered Sep 30 '20 at 07:45

noob

672
10
28

Oh, alright, thank you. Is there anyway to stop it from printing the world "symbol" twice in a row before the actual matrix? (as seen in my image above). And 0.02 is a weak correlation, correct? (since it's like 2%) – Robin Svensson Sep 30 '20 at 08:04
the first question you could one of the techies that I mention it to you for drawing collaboration matrix is well organized and no need for any organization for it the second question correlation an easy way to understand correlation is to plot the two columns using a scatter plot for example if we hat correlation .70 between x and y it means that when x increase y increase so you depend is it weak for you or not the general answer is yes please don't forget to upvote my answer if you find it convenient – noob Sep 30 '20 at 08:10

Faulty correlation

2 Answers2