Calculating the covariance of a 1000 5x5 matrices in matlab

Question

I have a 1000 5x5 matrices (Xm) like this:

enter image description here

Each $(x_ij)m$ is a point estimate drawn from a distribution. I'd like to calculate the covariance cov of each $x{ij}$, where i=1..n, and j=1..n in the direction of the red arrow.

For example the variance of $X_m$ is `var(X,0,3) which gives a 5x5 matrix of variances. Can I calculate the covariance in the same way?

Attempt at answer

So far I've done this:

for m=1:1000
Xm_new(m,:)=reshape(Xm(:,:,m)',25,1);
end

cov(Xm_new)
spy(Xm_new) gives me this unusual looking sparse matrix:

enter image description here

score 5 · Accepted Answer · edited May 23 '17 at 12:33

5

If you look at cov (edit cov in the command window) you might see why it doesn't support multi-dimensional arrays. It perform a transpose and a matrix multiplication of the input matrices: xc' * xc. Both operations don't support multi-dimensional arrays and I guess whoever wrote the function decided not to do the work to generalize it (it still might be good to contact the Mathworks however and make a feature request).

In your case, if we take the basic code from cov and make a few assumptions, we can write a covariance function M-file the supports 3-D arrays:

function x = cov3d(x)
% Based on Matlab's cov, version 5.16.4.10

[m,n,p] = size(x);
if m == 1
    x = zeros(n,n,p,class(x));
else
    x = bsxfun(@minus,x,sum(x,1)/m);
    for i = 1:p
        xi = x(:,:,i);
        x(:,:,i) = xi'*xi;
    end
    x = x/(m-1);
end

Note that this simple code assumes that x is a series of 2-D matrices stacked up along the third dimension. And the normalization flag is 0, the default in cov. It could be exapnded to multiple dimensions like var with a bit of work. In my timings, it's over 10 times faster than a function that calls cov(x(:,:,i)) in a for loop.

Yes, I used a for loop. There may or may not be faster ways to do this, but in this case for loops are going to be faster than most schemes, especially when the size of your array is not known a priori.

edited May 23 '17 at 12:33

Community

1
1

answered Jul 02 '13 at 21:43

horchler

18,384
4
37
73

Ah very nice! Thank you! I am however looking for a 5x5 covariance matrix based on a 1000 observations (stacked ontop of each other in the 3rd dimension) of each variable x_ij. So for example Xm(1,1,:) are observations of the same variable, etc. Is this what this function is doing? – HCAI Jul 02 '13 at 21:49
It's equivalent to calling the `cov` function 1000 times for a 5-by-5 input and storing the 5-by-5 output: `Xm_out(:,:,1)=cov(Xm(:,:,1))`, `Xm_out(:,:,2)=cov(Xm(:,:,2))`, ..., `Xm_out(:,:,999)=cov(Xm(:,:,999))`, `Xm_out(:,:,1000)=cov(Xm(:,:,1000))`. – horchler Jul 02 '13 at 21:55
Oh ah... I'm trying to find the covariance for each x_ij, i\ne=j, such that x12 is the covariance between the column vectors x1 and x2 going in the direction of the red arrow. Have I not asked the correct question? – HCAI Jul 03 '13 at 18:09
X(1,1,:) are iid of x1 and X(1,2,:) are iid of x2. I'd like to create the 5x5 matrix of covariances Y where each entry y_ij, was the covariance between X(i,j,:). Is this possible? – HCAI Jul 03 '13 at 18:11
Yes, now I am confused as to what you want. My answer is the direct analog to `var(X,0,3)` that you mentioned in your question. It sounds like you want to find the covariance of the 1-by-1-by-1000 (same as 1-by-1000 really) vector `X(i,j,:)`? As per the help for `cov`, `cov(x)` for vector `x` is equivalent to the variance, i.e., `var`. – horchler Jul 03 '13 at 18:44
Thanks, I'd like to find the covariance thus: `for i=1:5 for j=1:5 find covariance between X(i,j,:)and X(i,j,:) end end`. Does this make sense? The simple reason I don't use the for loop is because I'm told X(i,j,:) must be 2D. – HCAI Jul 04 '13 at 14:42
Ok, let `v=rand(1,1,1000);` be your `X(i,j,:)` for `i=1,j=1`. Now, y1=cov(v)` gives the error "Inputs must be 2-D." So do this: `y1=cov(squeeze(v))`. However, this is equivalent to `y2=var(v,0,3)` (or simply `y2=var(v)` in this case). Or are you asking about `y3=cov(squeeze(v),squeeze(v))`? That's just equal to `[y2 y2;y2 y2]` as you're calculating the covariance of something with respect to itself. That's what the "covariance between `X(i,j,:)` and `X(i,j,:)`" means. It's a bit nonsensical. From how your data is stored and based on what you've asked, my code still makes the most sense to me. – horchler Jul 04 '13 at 16:15

score 2 · Answer 2 · answered Sep 12 '14 at 20:06

The answer below also works for a rectangular matrix xi=x(:,:,i)

function xy = cov3d(x)

[m,n,p] = size(x);
if m == 1
    x = zeros(n,n,p,class(x));
else
    xc = bsxfun(@minus,x,sum(x,1)/m);
    for i = 1:p
        xci = xc(:,:,i);
        xy(:,:,i) = xci'*xci;
    end
    xy = xy/(m-1);
end

My answer is very similar to horchler, however horchler's code does not work with rectangular matrices xi (whose dimensions are different from xi'*xi dimensions).

Calculating the covariance of a 1000 5x5 matrices in matlab

2 Answers2

Linked