I have about 1000 vectors x_i of dimension 50000, but they are very sparse; each has only about 50-100 nonzero elements. I want to do PCA on this dataset (in MATLAB) to reduce the unneeded extreme dimensionality of the data.
Unfortunately, I don't know any way to do this without an intermediate full matrix due to the need to subtract the means from all examples. And of course, a 1000x50000 matrix is too big to fit into memory (it actually crashes my entire computer for some reason when I try). Matlab's built in princomp
crashes my computer when I try to use it, too.
So my question is: is there a way to do PCA on this data without requiring a massive non-sparse matrix as an intermediate step?