0

I have to calculate a pca using processpca (lecture excercise, not able to user alternatives here I think) from the Neural Network Toolbox of a 400*60000 matrix (on a 64bit 8gb ram machine). The error message is:

Error using eye
Out of memory. Type HELP MEMORY for your options.

Error in processpca.create (line 15)
    settings.transform = eye(R);

Error in processpca (line 60)
[y,settings] = processpca.create(x,param);

Error in pca (line 21)
[trainDataPCA,psPCA] = processpca(trainData);

My code is:

% PCA - Reduce feature dimensions to selected dimensions
function [trainDataPCA,trainLabelsPCA] = pca(trainData, trainLabels, nDim)

if nargin < 3 
    print "Exact three arguments needed"
    return
end 

% Reduce nDim to max size of input data
nDimIn = size(trainData,1);
nDimOut = min(nDim,nDimIn);

% Normalise feature  - standard deviation 1, mean 0
trainData = trainData';
trainLabels = trainLabels';
[trainData,trainPS] = mapstd(trainData);

% Calculate PCA
[trainDataPCA,psPCA] = processpca(trainData);
trainDataPCA = trainDataPCA(:,1:nDimOut);

% Map PCA to labels
trainLabelsPCA = processpca('apply',trainLabels,psPCA);
trainLabelsPCA = trainLabelsPCA(:,1:nDimOut);

trainDataPCA = trainDataPCA';
trainLabelsPCA = trainLabelsPCA';

What can I do in this situation?

EDIT: My memory:

Maximum possible array:      9861 MB (1.034e+10 bytes) *
Memory available for all arrays:      9861 MB (1.034e+10 bytes) *
Memory used by MATLAB:       680 MB (7.128e+08 bytes)
Physical Memory (RAM):      8187 MB (8.585e+09 bytes)

*  Limited by System Memory (physical + swap file) available.
Matthias Preu
  • 783
  • 2
  • 8
  • 18
  • `400 x 60000` is a pretty large matrix if it's `full`. It looks like when you're using `processpca`, it's also trying to allocate matrices that are the same size as `400 x 60000` before it proceeds to train your data. You will very quickly run out of memory. One thing you could do is try increasing the Java Heap Size. This is a shot in the dark but it could work: Go into your `Preferences -> General -> Java Heap Memory`. Max it out to as much as you can. – rayryeng Jul 08 '14 at 16:49
  • Mhh set heap memory to 2046, but that did not help. The matrix has 400 grey values for 60000 digits, from my pov there are a lot of null values in it. Can I optimize something here? – Matthias Preu Jul 08 '14 at 16:57
  • Did you try using `sparse`? http://www.mathworks.com/help/matlab/ref/sparse.html. Take your matrix and do `sparse(A);`, where `A` is your matrix. `sparse` is used to represent matrices with many zero elements. This could give you some memory savings. Hopefully the NN Toolbox accepts `sparse` matrices as valid input as well. I haven't used the NN Toolbox in a long time so I can't provide any insight here. BTW, are you doing hand-digit recognition with the MNIST database? Their patches are 20 x 20 with 60000 test images, so the 400 pixels per image seems familiar. – rayryeng Jul 08 '14 at 17:00
  • No did not try that yet, thanks for the hint. Yes, I'm using the MNIST database to make some basic investigations about pattern recognition using neural networks. – Matthias Preu Jul 08 '14 at 17:04
  • Cool! OK, you can **definitely** try using `sparse`. The ratio of black pixels to white (hand-written) pixels is quite large. – rayryeng Jul 08 '14 at 17:04
  • FWIW, if you need hints on reading in the database properly, I don't know if you already have, but check my previous post: http://stackoverflow.com/questions/24127896/reading-mnist-image-database-binary-file-in-matlab/24128983#24128983 – rayryeng Jul 08 '14 at 17:05
  • Thanks for that, but I have to use the provided function from here: http://www.mathworks.com/matlabcentral/fileexchange/27675-read-digits-and-labels-from-mnist-database/content/readMNIST.m . I tried using `trainData = sparse(trainData);[trainDataPCA,psPCA] = processpca(trainData);`, but always getting out of memory errors. Maybe splitting the matrix (column wise) and calculating the pca for each part can help? – Matthias Preu Jul 08 '14 at 17:15
  • Yes, I would say that is more memory efficient. Run it through a `for` loop. Pre-allocate `trainDataPCA` first, and make `psPCA` a `cell` array, then iterate through each of the columns. Hopefully this will work! – rayryeng Jul 08 '14 at 17:20
  • Thanks for that code, but I get the error in the first iteration in the loop (don't know how that can happen). Rather difficult to compute that ;). – Matthias Preu Jul 08 '14 at 17:31
  • With around 30000 values I can bring the function to work, after function execution only 807 mb of memory are used by Matlab. Maybe computing half of the matrix, saving it to the hard disk and doing a merge outside the function later on could work. – Matthias Preu Jul 08 '14 at 18:03
  • Yeah sorry about that bad code. I removed it before you could use it, but it looks like you already did :P. OK, that could work. Use `save` and save the relevant variables to disk, clear those variables and run it again. When you're done, reload those variables and then proceed. I wonder how Yann LeCun did it?! – rayryeng Jul 08 '14 at 18:07
  • No apology necessary, I'm grateful for every help you gave me. I will give that method a try, but it seems not like a 'clean' solution for me too. But when it works I'm ok with that at the moment :D – Matthias Preu Jul 08 '14 at 18:16
  • 1
    just a quick note about increasing the Java heap size; actually that would have an adverse effect, because it would reserve memory for Java taken away from the total memory available for other things. In this case the algorithm is implemented in MATLAB not in Java. In fact it will end up calling the `svd` function which does the majority of the work. Reading the source code, `processpca` is implemented similarly to what I've shown in this post: http://stackoverflow.com/a/3181851 – Amro Jul 08 '14 at 18:35
  • @Amro - Very nice. Yeah I naively suggested to increase the Java heap size without fully knowing its ramifications. Again, you have taught me something new. – rayryeng Jul 08 '14 at 19:36
  • @Meiner - Consider using Amro's implementation of PCA via SVD to be more memory efficient and for being numerically more stable. – rayryeng Jul 08 '14 at 19:56

1 Answers1

0

I had the same problem with the same data set on exactly the same hardware. I increased windows virtual memory size on one of my drives to fall in range [10GB-30GB]. After restarting windows and running Matlab, everything was fine.

Armando
  • 3
  • 2