0

Suppose I have a 2d array ma with shape (n_rows_ma,n_cols_ma),and a 2d array mb with shape (n_rows_mb,n_cols_mb).Now I want to calculate the correlation between every row in ma and every row in mb. The easiest way may be

import numpy as np
correlation = np.corrcoef(ma,mb)[:n_rows_ma,n_rows_ma:]

But this is too inefficient.So I wonder if there is a more efficient way?

newbie
  • 35
  • 1
  • 4

1 Answers1

0

The formula for corrcoef is straightforward to implement, doing so we can compute only what we want to use:

>>> import numpy as np
>>> 
>>> ma = np.random.random((5,6))
>>> mb = np.random.random((3,6))
>>> 
>>> za = ma - ma.mean(axis=1, keepdims=True)
>>> za /= np.sqrt(np.einsum('ij,ij->i', za, za))[:, None]
>>> zb = mb - mb.mean(axis=1, keepdims=True)
>>> zb /= np.sqrt(np.einsum('ij,ij->i', zb, zb))[:, None]
>>> 
>>> cc = np.einsum('ik,jk', za, zb)
>>> 
>>> np.allclose(cc, np.corrcoef(np.r_[ma, mb])[:5, 5:])
True
Paul Panzer
  • 51,835
  • 3
  • 54
  • 99