I have two matrices p (500x10000) and h (500x256) and I need to calculate the correlation in Python.
In Matlab, I used the corr() function without any problem : myCorrelation = corr( p, h );
In numpy, I tried np.corrcoef( p, h )
:
File "/usr/local/lib/python2.7/site-packages/numpy/core/shape_base.py", line 234, in vstack
return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: all the input array dimensions except for the concatenation axis must match exactly
and I also tried np.correlate( p, h )
:
File "/usr/local/lib/python2.7/site-packages/numpy/core/numeric.py", line 975, in correlate
return multiarray.correlate2(a, v, mode)
ValueError: object too deep for desired array
Input :
pw.shape = (500, 10000)
hW.shape = (500, 256)
First, I have tried this :
myCorrelationMatrix, _ = scipy.stats.pearsonr( pw, hW )
Result :
myCorrelationMatrix, _ = scipy.stats.pearsonr( pw, hW )
File "/usr/local/lib/python2.7/site-packages/scipy/stats/stats.py", line 3019, in pearsonr
r_num = np.add.reduce(xm * ym)
ValueError: operands could not be broadcast together with shapes (500,10000) (500,256)
and tried this :
myCorrelationMatrix = corr2_coeff( pw, hW )
where corr2_coeff
according to 1 is :
def corr2_coeff(A,B) :
# Rowwise mean of input arrays & subtract from input arrays themeselves
A_mA = A - A.mean(1)[:,None]
B_mB = B - B.mean(1)[:,None]
# Sum of squares across rows
ssA = (A_mA**2).sum(1);
ssB = (B_mB**2).sum(1);
# Finally get corr coeff
return np.dot(A_mA,B_mB.T)/np.sqrt(np.dot(ssA[:,None],ssB[None]))
the result is this :
myCorrelationMatrix, _ = corr2_coeff( powerTraces, hW )
File "./myScript.py", line 175, in corr2_coeff
return np.dot(A_mA,B_mB.T)/np.sqrt(np.dot(ssA[:,None],ssB[None]))
ValueError: shapes (500,10000) and (256,500) not aligned: 10000 (dim 1) != 256 (dim 0)
and finally tried this :
myCorrelationMatrix = corr_coeff( pw, hW )
where corr_coeff
according to 2 is :
def corr_coeff(A,B) :
# Get number of rows in either A or B
N = B.shape[0]
# Store columnw-wise in A and B, as they would be used at few places
sA = A.sum(0)
sB = B.sum(0)
# Basically there are four parts in the formula. We would compute them one-by-one
p1 = N*np.einsum('ij,ik->kj',A,B)
p2 = sA*sB[:,None]
p3 = N*((B**2).sum(0)) - (sB**2)
p4 = N*((A**2).sum(0)) - (sA**2)
# Finally compute Pearson Correlation Coefficient as 2D array
pcorr = ((p1 - p2)/np.sqrt(p4*p3[:,None]))
# Get the element corresponding to absolute argmax along the columns
# out = pcorr[np.nanargmax(np.abs(pcorr),axis=0),np.arange(pcorr.shape[1])]
return pcorr
and the result is :
RuntimeWarning: invalid value encountered in sqrt
pcorr = ((p1 - p2)/np.sqrt(p4*p3[:,None]))
RuntimeWarning: invalid value encountered in divide
pcorr = ((p1 - p2)/np.sqrt(p4*p3[:,None]))
Update
This is not a duplicate, I've tried both methods you gave on Computing the correlation coefficient between two multi-dimensional arrays and Efficient pairwise correlation for two matrices of features and none of them worked.