Pearson Correlation between two Matrices

Question

So I want to compute the correlation between every row of one matrix and every column of the other matrix. This is with what I have come up, it works, but I think its pretty slow (takes about 60 seconds for two matrices of shape 500x30000 and 500x16).

def matrix_corr(array1,array2):

array1= array1.T #transposed in order to have same dimension in the middle of both matrices


corr_pear = np.empty((array1.shape[0],array2.shape[1]))


for n in range(array1.shape[0]): #30000 rowvectors
    array1_mean = np.mean(array1[n,:]) #n-th rowvector 
    array1_squared = np.sum(np.square(array1[n,:]-array1_mean))
    for m in range(array2.shape[1]): #256 columnvectors of hypothesis
        array2_mean = np.mean(array1[:,m])
        array2_squared = np.sum(np.square(array1[:,m]-array2_mean))
        corr_pear[n,m] = (np.inner((array1[n,:]-array1_mean),(array2[:,m]-array2_mean))/
        float(math.sqrt(array1_squared*array2_squared)))

return corr_pear

I guss there is a more pythonic way to solve this.

I hope someone of you could tell me how to tweak it.

Thanks!

It seems you needed two edits in the nested loopy code : `array2_mean = np.mean(array2[:,m])` and `array2_squared = np.sum(np.square(array2[:,m]-array2_mean))`. — Divakar, Dec 18 '16 at 10:49
Oh, I see its a typo. But I think this won't make my code faster... Any ideas? — SolingerMUC, Dec 18 '16 at 12:25
See the linked duplicate Q&A that has a vectorized solution and also includes runtime tests. — Divakar, Dec 18 '16 at 12:27
Thanks, I will check it. My solution without the typo now runs around 10 seconds. You think as well, that this is still slow, right? — SolingerMUC, Dec 18 '16 at 12:30
That's to add new axis. None is an alias for `np.newaxis` : https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#numpy.newaxis — Divakar, Dec 18 '16 at 12:49
Could you please explain what your code does? I am a little bit confused about your use of [:,None] taking only the element at position (1) of .mean and .sum (what do they even output) and moreover the notation ssA[:,None],ssB[None] is confusing to me. I guess you are returning the divison of two matrices, where the numerator contains all dot.products as a matrix and the denominator the square root of all sum of square products as a matrix as well. Please explain a bit! Thank you — SolingerMUC, Dec 18 '16 at 12:55
That 1 in `.mean(1)` essentially means `.mean(axis=1)` that is we are averaging along `axis = 1`. For a 2D array that means along each row, hence the comment : `# Rowwise mean ..` there. With `ssA[:,None]` we are adding in a new axis at the end and with `ssB[None]`, we are adding a new axis at the front. — Divakar, Dec 18 '16 at 12:57

Pearson Correlation between two Matrices

0 Answers0