1

I have two matrices with the same shape:

import numpy as np
from scipy.stats import pearsonr
np.random.seed(10)
a = np.random.random(30).reshape(10,3)
b = np.random.random(30).reshape(10,3)

i.e., 10 rows and three columns. I need the rolling correlation of the columns with the same column index in each matrix. The slow way is:

def roll_corr((a, b), window):
    out = np.ones_like(a)*np.nan
    for i in xrange(window-1, a.shape[0]):
        #print "%d --> %d" % ((i-(window-1)), i)
        for j in xrange(a.shape[1]):
            out[i, j] = pearsonr(
                a[(i-(window-1)):(i), j], b[(i-(window-1)):(i), j]
            )[0]
    return out

With results for roll_corr((a, b), 5) as I want,

array([[        nan,         nan,         nan],
       [        nan,         nan,         nan],
       [        nan,         nan,         nan],
       [        nan,         nan,         nan],
       [ 0.28810753,  0.27836622,  0.88397851],
       [-0.04076151,  0.45254981,  0.83259104],
       [ 0.62262963, -0.4188768 ,  0.35479134],
       [ 0.13130652, -0.91441413, -0.21713372],
       [ 0.54327228, -0.91390053, -0.84033286],
       [ 0.45268257, -0.95245888, -0.50107515]])

The question is: is there a more idiomatic numpy way to do this? Vectorized? Strides trick? Numba?

I have searched but have not found this. I no not want to use pandas; must be numpy.

Jonathan
  • 1,287
  • 14
  • 17

2 Answers2

3

We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows. More info on use of as_strided based view_as_windows.

Thus, we would have one solution based on corr2_coeff_rowwise, like so -

from skimage.util import view_as_windows

A = view_as_windows(a,(window,1))[...,0]
B = view_as_windows(b,(window,1))[...,0]

A_mA = A - A.mean(-1, keepdims=True)
B_mB = B - B.mean(-1, keepdims=True)

## Sum of squares across rows
ssA = (A_mA**2).sum(-1) # or better : np.einsum('ijk,ijk->ij',A_mA,A_mA)
ssB = (B_mB**2).sum(-1) # or better : np.einsum('ijk,ijk->ij',B_mB,B_mB)

## Finally get corr coeff
out = np.full(a.shape, np.nan)
out[window-1:] = np.einsum('ijk,ijk->ij',A_mA,B_mB)/np.sqrt(ssA*ssB)
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • On large amount of data using np.lib.stride_tricks.as_strided may lead to Memory error. So this solution is useful but only for comparatively small arrays. – Prokhozhii Jun 27 '20 at 11:02
-1

You can use pandas.rolling_curr() function to generate the correlation. I can't see why they give different outputs, though.

import numpy as np
import pandas as pd
from scipy.stats import pearsonr

np.random.seed(10)
a = np.random.random(30).reshape(10,3)
b = np.random.random(30).reshape(10,3)

a_1 = pd.DataFrame(a)
b_1 = pd.DataFrame(b)

print pd.rolling_corr(arg1=a_1, arg2=b_1, window=5)

# OUTPUT
===============================
   0         1         2
0  NaN       NaN       NaN 
1  NaN       NaN       NaN
2  NaN       NaN       NaN
3  NaN       NaN       NaN
4  0.441993  0.254435  0.707801 
5  0.314446  0.233392  0.425191
6  0.243755 -0.441434  0.352801
7  0.281139 -0.864357 -0.192409
8  0.543645 -0.925822 -0.563786
9  0.445918 -0.784808 -0.532234
Oğuz K
  • 164
  • 3
  • 7
  • 1
    In newer versions of pandas, there is pd.rolling() function which is a bit different – Oğuz K Sep 04 '18 at 20:28
  • 1
    Do you know how this would be done with the new `rolling` syntax? That syntax is a method of a single dataframe; i.e., it is not pd.rolling(), it is df.rolling()...; so here something like a_1.rolling(5,5)... but how to make it column by column with b1? – Jonathan Sep 05 '18 at 13:14
  • I think one way to do it can be to vectorize both matrices as (30,1) then concatenate them. I am not so familiar with it, though. – Oğuz K Sep 06 '18 at 16:51