Numpy find covariance of two 2-dimensional ndarray

Question

I am new to numpy and am stuck at this problem. I have two 2-dimensional numpy array such as

x = numpy.random.random((10, 5))
y = numpy.random.random((10, 5))

I want to use numpy cov function to find covariance of these two ndarrays row wise. i.e., for above example the output array should consist of 10 elements each denoting the covariance of corresponding rows of the ndarrays. I know I can do this by traversing the rows and finding the covariance of two 1D arrays but it isn't pythonic.

Edit1: The covariance of two array denotes the element at 0, 1 index.

Edit2: Currently this is my implementation

s = numpy.empty((x.shape[0], 1))
for i in range(x.shape[0]):
    s[i] = numpy.cov(x[i], y[i])[0][1]

score 2 · Accepted Answer · answered Sep 06 '18 at 08:04

2

Use the definition of the covariance: E(XY) - E(X)E(Y).

import numpy as np
x = np.random.random((10, 5))
y = np.random.random((10, 5))

n = x.shape[1]
cov_bias = np.mean(x * y, axis=1) - np.mean(x, axis=1) * np.mean(y, axis=1))
cov_bias * n / (n-1)

Note that cov_bias corresponds to the result of numpy.cov(bias=True).

answered Sep 06 '18 at 08:04

Kota Mori

6,510
1
21
25

This is the fastest approach, I have seen till now. – Paras Sep 06 '18 at 08:34
Good one indeed! You can optimize it further with : `np.einsum('ij,ij->i',x,y)/x.shape[1]` to replace `np.mean(x * y, axis=1)`. – Divakar Sep 06 '18 at 08:44

score 1 · Answer 2 · answered Sep 06 '18 at 08:04

Here's one using the definition of covariance and inspired by corr2_coeff_rowwise -

def covariance_rowwise(A,B):
    # Rowwise mean of input arrays & subtract from input arrays themeselves
    A_mA = A - A.mean(-1, keepdims=True)
    B_mB = B - B.mean(-1, keepdims=True)

    # Finally get covariance
    N = A.shape[1]
    return np.einsum('ij,ij->i',A_mA,B_mB)/(N-1)

Sample run -

In [66]: np.random.seed(0)
    ...: x = np.random.random((10, 5))
    ...: y = np.random.random((10, 5))

In [67]: s = np.empty((x.shape[0]))
    ...: for i in range(x.shape[0]):
    ...:     s[i] = np.cov(x[i], y[i])[0][1]

In [68]: np.allclose(covariance_rowwise(x,y),s)
Out[68]: True

rocksportrocker · Answer 3 · 2018-09-06T08:11:10.377

This works, but I'm not sure if it is faster for larger matrices x and y, the call numpy.cov(x, y) computes many entries we discard with numpy.diag:

x = numpy.random.random((10, 5))
y = numpy.random.random((10, 5))

# with loop
for (xi, yi) in zip(x, y):
    print(numpy.cov(xi, yi)[0][1])

# vectorized
cov_mat = numpy.cov(x, y)
covariances = numpy.diag(cov_mat, x.shape[0])
print(covariances)

I also did some timing for square matrices of size n x n:

import time
import numpy

def run(n):
    x = numpy.random.random((n, n))
    y = numpy.random.random((n, n))

    started = time.time()
    for (xi, yi) in zip(x, y):
        numpy.cov(xi, yi)[0][1]

    needed_loop = time.time() - started

    started = time.time()
    cov_mat = numpy.cov(x, y)
    covariances = numpy.diag(cov_mat, x.shape[0])
    needed_vectorized = time.time() - started
    print(
        f"n={n:4d} needed_loop={needed_loop:.3f} s "
        f"needed_vectorized={needed_vectorized:.3f} s"
    )

for n in (100, 200, 500, 600, 700, 1000, 2000, 3000):
    run(n)

output on my slow MacBook Air is

n= 100 needed_loop=0.006 s needed_vectorized=0.001 s
n= 200 needed_loop=0.011 s needed_vectorized=0.003 s
n= 500 needed_loop=0.033 s needed_vectorized=0.023 s
n= 600 needed_loop=0.041 s needed_vectorized=0.039 s
n= 700 needed_loop=0.043 s needed_vectorized=0.049 s
n=1000 needed_loop=0.061 s needed_vectorized=0.130 s
n=2000 needed_loop=0.137 s needed_vectorized=0.742 s
n=3000 needed_loop=0.224 s needed_vectorized=2.264 s

so the break even point is around n=600

score 0 · Answer 4 · answered Sep 06 '18 at 08:12

0

Pick the diagonal vector of cov(x,y) and expand dims:

numpy.expand_dims(numpy.diag(numpy.cov(x,y),x.shape[0]),1)

answered Sep 06 '18 at 08:12

Ernest S Kirubakaran

1,524
12
16

Numpy find covariance of two 2-dimensional ndarray

4 Answers4