5

Possible Duplicate:
MATLAB is running out of memory but it should not be

I want to perform PCA analysis on a huge data set of points. To be more specific, I have size(dataPoints) = [329150 132] where 328150 is the number of data points and 132 are the number of features.

I want to extract the eigenvectors and their corresponding eigenvalues so that I can perform PCA reconstruction.

However, when I am using the princomp function (i.e. [eigenVectors projectedData eigenValues] = princomp(dataPoints); I obtain the following error :

>> [eigenVectors projectedData eigenValues] = princomp(pointsData);
Error using svd
Out of memory. Type HELP MEMORY for your options.

Error in princomp (line 86)
[U,sigma,coeff] = svd(x0,econFlag); % put in 1/sqrt(n-1) later

However, if I am using a smaller data set, I have no problem.

How can I perform PCA on my whole dataset in Matlab? Have someone encountered this problem?

Edit:

I have modified the princomp function and tried to use svds instead of svd, but however, I am obtaining pretty much the same error. I have dropped the error bellow :

Error using horzcat
Out of memory. Type HELP MEMORY for your options.

Error in svds (line 65)
B = [sparse(m,m) A; A' sparse(n,n)];

Error in princomp (line 86)
[U,sigma,coeff] = svds(x0,econFlag); % put in 1/sqrt(n-1) later
Community
  • 1
  • 1
Simon
  • 4,999
  • 21
  • 69
  • 97

4 Answers4

5

Solution based on Eigen Decomposition

You can first compute PCA on X'X as @david said. Specifically, see the script below:

sz = [329150 132];
X = rand(sz);

[V D] = eig(X.' * X);

Actually, V holds the right singular vectors, and it holds the principal vectors if you put your data vectors in rows. The eigenvalues, D, are the variances among each direction. The singular vectors, which are the standard deviations, are computed as the square root of the variances:

S = sqrt(D);

Then, the left singular vectors, U, are computed using the formula X = USV'. Note that U refers to the principal components if your data vectors are in columns.

U = X*V*S^(-1);

Let us reconstruct the original data matrix and see the L2 reconstruction error:

X2 = U*S*V';
L2ReconstructionError = norm(X(:)-X2(:))

It is almost zero:

L2ReconstructionError =
  6.5143e-012

If your data vectors are in columns and you want to convert your data into eigenspace coefficients, you should do U.'*X.

This code snippet takes around 3 seconds in my moderate 64-bit desktop.

Solution based on Randomized PCA

Alternatively, you can use a faster approximate method which is based on randomized PCA. Please see my answer in Cross Validated. You can directly compute fsvd and get U and V instead of using eig.

You may employ randomized PCA if the data size is too big. But, I think the previous way is sufficient for the size you gave.

Community
  • 1
  • 1
petrichor
  • 6,459
  • 4
  • 36
  • 48
  • Just for clarification, shouldn't it be "The singular *values*, which are the standard deviations"? – Shadow Oct 17 '14 at 07:02
1

My guess is that you have a huge data set. You don't need all of the svd coefficients. In this case, use svds instead of svd :

Taken directly from Matlab help:

 s = svds(A,k) computes the k largest singular values and associated singular vectors of matrix A.

From your question, I understand that you don't call svd directly. But you might as well take a look at princomp (It is editable!) and alter the line that calls it.

Andrey Rubshtein
  • 20,795
  • 11
  • 69
  • 104
0

You probably needed to calculate an n by n matrix in your computation somehow that is to say:

329150 * 329150 * 8btyes ~ 866GB`

of space which explains why you're getting a memory error. There seems to be an efficient way to calculate pca using princomp(X, 'econ') which I suggest you give it a try.

More on this in stackoverflow and mathworks..

Community
  • 1
  • 1
none
  • 11,793
  • 9
  • 51
  • 87
  • `329150` are the number of data point. I have `1321 features. So basically I have to compute a `132 by 132` matrix. – Simon Oct 03 '12 at 05:20
  • you wouldn't go out of memory with a `132 by 132` matrix. see [this](http://www.mathworks.cn/matlabcentral/newsreader/view_thread/235282), [this](http://www.mathworks.com/matlabcentral/newsreader/view_thread/109472) and [this](http://compgroups.net/comp.soft-sys.matlab/pca-out-of-memory/887804) to calculate the required memory for pca. you can just try what I said or just subsample your data. – none Oct 03 '12 at 08:42
0

Manually compute X'X (132x132) and svd on it. Or find NIPALS script.

Tae-Sung Shin
  • 20,215
  • 33
  • 138
  • 240