For a data matrix of size n-by-p, PRINCOMP
will return a coefficient matrix of size p-by-p where each column is a principal component expressed using the original dimensions, so in your case you will create an output matrix of size:
1036800*1036800*8 bytes ~ 7.8 TB
Consider using PRINCOMP(X,'econ')
to return only the PCs with significant variance
Alternatively, consider performing PCA by SVD: in your case n<<p
, and the covariance matrix is impossible to compute. Therefore, instead of decomposing the p-by-p matrix XX'
, it is sufficient to only decompose the smaller n-by-n matrix X'X
. Refer to this paper for reference.
EDIT:
Here's my implementation, the outputs of this function match those of PRINCOMP (the first three anyway):
function [PC,Y,varPC] = pca_by_svd(X)
% PCA_BY_SVD
% X data matrix of size n-by-p where n<<p
% PC columns are first n principal components
% Y data projected on those PCs
% varPC variance along the PCs
%
X0 = bsxfun(@minus, X, mean(X,1)); % shift data to zero-mean
[U,S,PC] = svd(X0,'econ'); % SVD decomposition
Y = X0*PC; % project X on PC
varPC = diag(S'*S)' / (size(X,1)-1); % variance explained
end
I just tried it on my 4GB machine, and it ran just fine:
» x = rand(16,1036800);
» [PC, Y, varPC] = pca_by_svd(x);
» whos
Name Size Bytes Class Attributes
PC 1036800x16 132710400 double
Y 16x16 2048 double
varPC 1x16 128 double
x 16x1036800 132710400 double
Update:
The princomp
function became deprecated in favor of pca
introduced in R2012b, which includes many more options.