3

I'd like to correlate the columns of an mxn matrix with a 1xm array. This should give me an 1xn array back. At the moment I am doing this a bit clumsy with:

c = np.corrcoef(X, y)[:-1,-1]

I find the correlations I want here in the last column and with the last row/column corresponding to the correlation the array have with it self (so r = 1.0).

This is fine, but however, I need to do this on quite big matrices and that is basically when it becomes too computationally heavy and my computer gives up.

For example the largest matrix I am doing this for has the size:

48x290400 (= X) and 48x1 (=y), where I want to end up with 290400 r-values

This works fine in Matlab, but not in python using np.corrcoef. Anyone got a good solution for this?

Cheers Daniel

Divakar
  • 218,885
  • 19
  • 262
  • 358
Daniel Lindh
  • 33
  • 1
  • 3

1 Answers1

3

We could use corr2_coeff from this post after transposing the input arrays -

corr2_coeff(a.T,b.T).ravel()

Sample run -

In [160]: a = np.random.rand(3, 5)

In [161]: b = np.random.rand(3, 1)

# Proposed in the question
In [162]: np.corrcoef(a.T, b.T)[:-1,-1]
Out[162]: array([-0.0716,  0.1905,  0.9699,  0.7482, -0.1511])

# Proposed in this post
In [163]: corr2_coeff(a.T,b.T).ravel()
Out[163]: array([-0.0716,  0.1905,  0.9699,  0.7482, -0.1511])

Runtime test -

In [171]: a = np.random.rand(48, 10000)

In [172]: b = np.random.rand(48, 1)

In [173]: %timeit np.corrcoef(a.T, b.T)[:-1,-1]
1 loops, best of 3: 619 ms per loop

In [174]: %timeit corr2_coeff(a.T,b.T).ravel()
1000 loops, best of 3: 1.72 ms per loop

In [176]: 619.0/1.72
Out[176]: 359.8837209302326

Massive 360x speedup there!

Scaling it further -

In [239]: a = np.random.rand(48, 29040)

In [240]: b = np.random.rand(48, 1)

In [241]: %timeit np.corrcoef(a.T, b.T)[:-1,-1]
1 loops, best of 3: 5.19 s per loop

In [242]: %timeit corr2_coeff(a.T,b.T).ravel()
100 loops, best of 3: 8.09 ms per loop

In [244]: 5190.0/8.09
Out[244]: 641.5327564894932

640x+ speedup on this bigger dataset and should scale better as we go towards actual dataset sizes!

Community
  • 1
  • 1
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • 1
    Thanks for the simulations as well! Yeah, I basically went from not being able to run at all to having my correlations within no time. Very useful – Daniel Lindh Mar 08 '17 at 18:52