I have a dataframe with 145 rows and 135 columns. I want to perform Spearman's rank correlation for each column with respect to each other column (thus 135x135). I then want to those these correlation in a new dataframe. (I have not done that yet.)
import pandas as pd
import numpy as np
overview = pd.read_excel(r'overview_20062022.xlsx')
df = pd.DataFrame(overview,
columns=['all the column names'])
from scipy.stats import spearmanr
# calculate spearman's correlation
for column in df.iteritems():
coef, p = spearmanr(df.iteritems(), df.iteritems())
print('Spearmans correlation coefficient: %.3f' % coef)
# interpret the significance
alpha = 0.05
if p > alpha:
print('Samples are uncorrelated (fail to reject H0) p=%.3f' % p)
else:
print('Samples are correlated (reject H0) p=%.3f' % p)
However, this now leads to NaN. Based on this question I tried to use iteritems
, but this has not worked, unfortunately.