I want to run a Spearman correlation of each column vs all the other columns in pandas. I need only the distribution of correlations (array), not a correlation matrix.
I know that I could use df.corr(method='spearman')
, however I need only the pairwise correlation, not the entire correlation matrix or the diagonal. I think this may speed up the computation, since I will be only computing ((N^2) - N)/2 correlations, instead of N^2.
However, this is just an assumption - since the matrix would be symmetric, maybe pandas already works by computing one half of the correlation matrix and then filling the rest accordingly.
By now my, very inefficient, solution is:
import pandas as pd
import scipy.stats as ss
# d is a pandas DataFrame
corr_a = []
for i, col1 in enumerate(d.columns):
for col2 in d.columns[i+1:]:
r, _ = ss.spearmanr(d.loc[col1], d.loc[col2])
corr_a += [r]
Is there any, builtin or vectorized, API to run this faster?