1

I have two numpy arrays

X.shape = (100, 10)
Y.shape = (100, 10)

I want to find the pearson correlations between columns of X and Y

i.e.

from scipy.stats.stats import pearsonr

def corr( X, Y ):
    return np.array([ pearsonr( x, y )[0] for x,y in zip( X.T, Y.T ) ] )    

corr( X, Y ).shape = (10, )

Is there a function for this? So far, all the functions I can find calculate correlation matrices. There is a pairwise correlation function in Matlab, so I'm pretty sure someone must have written one for Python.

The reason why I don't like the example function above is because it seems slow.

Ginger
  • 8,320
  • 12
  • 56
  • 99
  • pearsonr, if you just want the correlation, is just `np.corrcoef(x, y, rowvar=0, bias=?)`. However, np.corrcoef calculates also corr(X, X) and corr(Y, Y) in the joint correlation matrix. – Josef May 21 '14 at 17:22
  • Have you looked if http://stackoverflow.com/questions/19401078/efficient-columnwise-correlation-coefficient-calculation-with-numpy helps you? – ilya May 26 '14 at 14:33

2 Answers2

5

If columns are variables and rows are observations in X, Y (and you would like to find column-wise correlations between X and Y):

X = (X - X.mean(axis=0)) / X.std(axis=0)
Y = (Y - Y.mean(axis=0)) / Y.std(axis=0)
pearson_r = np.dot(X.T, Y) / X.shape[0]

To find the p-value, convert the pearson_r to t statistics:

t = pearson_r * np.sqrt(X.shape[0] - 2) / np.sqrt(1 - pearson_r ** 2)

and the p-values is 2 × P(T > t).

2

I modified from scipy.stats.pearsonr:

from scipy.stats import pearsonr

x = np.random.rand(100, 10)
y = np.random.rand(100, 10)

def corr( X, Y ):
    return np.array([ pearsonr( x, y )[0] for x,y in zip( X.T, Y.T) ] )

def pair_pearsonr(x, y, axis=0):
    mx = np.mean(x, axis=axis, keepdims=True)
    my = np.mean(y, axis=axis, keepdims=True)
    xm, ym = x-mx, y-my
    r_num = np.add.reduce(xm * ym, axis=axis)
    r_den = np.sqrt((xm*xm).sum(axis=axis) * (ym*ym).sum(axis=axis))
    r = r_num / r_den
    return r

np.allclose(pair_pearsonr(x, y, axis=0), corr(x, y))
HYRY
  • 94,853
  • 25
  • 187
  • 187