4

I am new to numpy and am stuck at this problem. I have two 2-dimensional numpy array such as

x = numpy.random.random((10, 5))
y = numpy.random.random((10, 5))

I want to use numpy cov function to find covariance of these two ndarrays row wise. i.e., for above example the output array should consist of 10 elements each denoting the covariance of corresponding rows of the ndarrays. I know I can do this by traversing the rows and finding the covariance of two 1D arrays but it isn't pythonic.

Edit1: The covariance of two array denotes the element at 0, 1 index.

Edit2: Currently this is my implementation

s = numpy.empty((x.shape[0], 1))
for i in range(x.shape[0]):
    s[i] = numpy.cov(x[i], y[i])[0][1]
Paras
  • 210
  • 3
  • 12

4 Answers4

2

Use the definition of the covariance: E(XY) - E(X)E(Y).

import numpy as np
x = np.random.random((10, 5))
y = np.random.random((10, 5))

n = x.shape[1]
cov_bias = np.mean(x * y, axis=1) - np.mean(x, axis=1) * np.mean(y, axis=1))
cov_bias * n / (n-1)

Note that cov_bias corresponds to the result of numpy.cov(bias=True).

Kota Mori
  • 6,510
  • 1
  • 21
  • 25
1

Here's one using the definition of covariance and inspired by corr2_coeff_rowwise -

def covariance_rowwise(A,B):
    # Rowwise mean of input arrays & subtract from input arrays themeselves
    A_mA = A - A.mean(-1, keepdims=True)
    B_mB = B - B.mean(-1, keepdims=True)

    # Finally get covariance
    N = A.shape[1]
    return np.einsum('ij,ij->i',A_mA,B_mB)/(N-1)

Sample run -

In [66]: np.random.seed(0)
    ...: x = np.random.random((10, 5))
    ...: y = np.random.random((10, 5))

In [67]: s = np.empty((x.shape[0]))
    ...: for i in range(x.shape[0]):
    ...:     s[i] = np.cov(x[i], y[i])[0][1]

In [68]: np.allclose(covariance_rowwise(x,y),s)
Out[68]: True
Divakar
  • 218,885
  • 19
  • 262
  • 358
0

This works, but I'm not sure if it is faster for larger matrices x and y, the call numpy.cov(x, y) computes many entries we discard with numpy.diag:

x = numpy.random.random((10, 5))
y = numpy.random.random((10, 5))

# with loop
for (xi, yi) in zip(x, y):
    print(numpy.cov(xi, yi)[0][1])

# vectorized
cov_mat = numpy.cov(x, y)
covariances = numpy.diag(cov_mat, x.shape[0])
print(covariances)

I also did some timing for square matrices of size n x n:

import time
import numpy

def run(n):
    x = numpy.random.random((n, n))
    y = numpy.random.random((n, n))

    started = time.time()
    for (xi, yi) in zip(x, y):
        numpy.cov(xi, yi)[0][1]

    needed_loop = time.time() - started

    started = time.time()
    cov_mat = numpy.cov(x, y)
    covariances = numpy.diag(cov_mat, x.shape[0])
    needed_vectorized = time.time() - started
    print(
        f"n={n:4d} needed_loop={needed_loop:.3f} s "
        f"needed_vectorized={needed_vectorized:.3f} s"
    )

for n in (100, 200, 500, 600, 700, 1000, 2000, 3000):
    run(n)

output on my slow MacBook Air is

n= 100 needed_loop=0.006 s needed_vectorized=0.001 s
n= 200 needed_loop=0.011 s needed_vectorized=0.003 s
n= 500 needed_loop=0.033 s needed_vectorized=0.023 s
n= 600 needed_loop=0.041 s needed_vectorized=0.039 s
n= 700 needed_loop=0.043 s needed_vectorized=0.049 s
n=1000 needed_loop=0.061 s needed_vectorized=0.130 s
n=2000 needed_loop=0.137 s needed_vectorized=0.742 s
n=3000 needed_loop=0.224 s needed_vectorized=2.264 s

so the break even point is around n=600

rocksportrocker
  • 7,251
  • 2
  • 31
  • 48
0

Pick the diagonal vector of cov(x,y) and expand dims:

numpy.expand_dims(numpy.diag(numpy.cov(x,y),x.shape[0]),1)
Ernest S Kirubakaran
  • 1,524
  • 12
  • 16