If you just want just the correlation between two columns, you can use buit-in pearsonr
module in scipy
, which returns Pearson correlation and the p-value.
Try this:
#input test data
>>> newData
DIS NFLX
0 0.620575 0.122005
1 0.124085 0.380087
2 0.286652 0.218533
3 0.569696 0.511214
4 0.081106 0.114614
5 0.223516 0.677468
6 0.226528 0.474243
7 0.998798 0.099523
8 0.994585 0.429352
9 0.277520 0.882989
>>> from scipy import stats
>>> corr, p_value = stats.pearsonr(newData['DIS'].values, newData['NFLX'].values)
>>> print(corr)
-0.25752281938162824
It is not returning anything faulty. df.corr()
returns a square correlation matrix, which is very useful if have multiple features/variable. You can always extract the correlation between df['DIS']
and df['NFLX']
through loc
and iloc
:
>>> #test data
>>> newData.corr()
DIS NFLX
DIS 1.000000 -0.257523
NFLX -0.257523 1.000000
>>> newData.corr().loc['DIS','NFLX']
-0.25752281938162824
>>> newData.corr().loc['NFLX','DIS']
-0.25752281938162824
>>> newData.corr().iloc[1][0] # 2nd row and 1st column
-0.25752281938162824
>>>newData.corr().loc[0][1] # 1nd row and 2nd column
-0.25752281938162824
You can make your correlation matrix look better instantly by using pandas style
:
newData.corr().style.background_gradient(cmap='viridis')

If you want to make correlation matrix look even better. You can use seaborn's heatmap functionality called sns.heatmap
. Here is an example:
import matplotlib.pyplot as plt
import seaborn as sns
sns.heatmap(newData.corr(),annot=True, lw=2, cmap='coolwarm')
plt.show()
Output:
