8

By assigning a matrix into a much bigger allocated memory, matlab somehow will duplicate it while 'copying' it, and if the matrix to be copied is large enough, there will be memory overflow. This is the sample code:

main_mat=zeros(500,500,2000);
n=500;
slice_matrix=zeros(500,500,n);
for k=1:4
    parfor i=1:n
        slice_matrix(:,:,i)=gather(gpuArray(rand(500,500)));
    end
    main_mat(:,:,1+(k-1)*n:1+(k-1)*n+n-1)=slice_matrix; %This is where the memory will likely overflow
end

Any way to just 'smash' the slice_matrix onto the main_mat without the overhead? Thanks in advance.

EDIT:

The overflow occurred when main_mat is allocated beforehand. If main_mat is initialized with main_mat=zeros(500,500,1); (smaller size), the overflow will not occur, but it will slowed down as allocation is not done before matrix is assigned into it. This will significantly reduce the performance as the range of k increases.

Gregor Isack
  • 1,111
  • 12
  • 25
  • 1
    As to your loops: [it's recommended to set the *outer* loop to a `parfor` loop for optimisation purposes](https://mathworks.com/help/parallel-computing/nested-parfor-loops-and-for-loops.html). Additionally, `parfor` copies your data to each separate worker, thus assuming 4 workers it duplicates your data four times in RAM. – Adriaan Dec 16 '19 at 15:55
  • 1
    What is your indication that Matlab is actually duplicating the memory? Are you using the [`memory`](https://www.mathworks.com/help/matlab/ref/memory.html) function? The task-manager? A memory error from Matlab? At what line of code is it happening? – Eliahu Aaron Dec 16 '19 at 16:46
  • As you could see where I commented on the code, `main_mat(:,:,1+(k-1)*n:1+(k-1)*n+n-1)` is where the memory overflow issue occur. It's verified when I allocated the `main_mat` beforehand, it'll overflow, if I don't, it'll not. Matlab will return 'out of memory error'. – Gregor Isack Dec 16 '19 at 17:10
  • Does your 500x500x2000 matrix fit in memory? It's ~4 Gb. See https://stackoverflow.com/q/51987892/7328782 for why the out of memory error could happen only when writing to the array. – Cris Luengo Dec 16 '19 at 17:45
  • To better understand your problem, could you insert a `h=h+slice_matrix(end)` prior to `main_mat(:,:,1+(k-1)*n:1+(k-1)*n+n-1)=slice_matrix;` (and initialize h with 0)? I suspect that this newly added line will already cause your memory issues. – Daniel Dec 16 '19 at 22:42
  • @CrisLuengo yeah that matrix size does actually fit my memory (I have 32GB of RAM), in real case scenario, it could be `1008x480x4000` or larger. @Daniel It didn't cause any memory issue by adding that line, it occur right after that at the `main_mat(:,:,1+(k-1)*n:1+(k-1)*n+n-1)=slice_matrix` line. To be honest I'm not surprised by this as `h` is initialized as a small matrix. – Gregor Isack Dec 17 '19 at 03:26
  • You can only tag one user in each comment, Daniel likely didn't get notified of your comment. – Cris Luengo Dec 17 '19 at 04:11
  • If you do `main_mat=zeros(500,500,2000); main_mat(end)=1` at the beginning of your code, do you get an error earlier on? – Cris Luengo Dec 17 '19 at 04:12
  • I still getting the error at `main_mat(:,:,1+(k-1)*n:1+(k-1)*n+n-1)=slice_matrix` line, `main_mat(end)=1` doesn't seems to return any error. – Gregor Isack Dec 17 '19 at 04:31
  • @Daniel guess I'll have to tag you again :) – Gregor Isack Dec 17 '19 at 04:31
  • @GregorIsack: Looking at the code I expected the problem to be the gathering of slice_matrix on the primary instance, which should raise the memory issues not later than the first use of the matrix. That why i asked you to inset some small operation using the matrix. Looks like my assumption was wrong. – Daniel Dec 17 '19 at 20:17

5 Answers5

4

The main issue is that numbers take more space than zeros. main_mat=zeros(500,500,2000); takes little RAM while main_mat = rand(500,500,2000); take a lot, no matter if you use GPU or parfor (in fact, parfor will make you use more RAM). So This is not an unnatural swelling of memory. Following Daniel's link below, it seems that the assignment of zeros only creates pointers to memory, and the physical memory is filled only when you use the matrix for "numbers". This is managed by the operating system. And it is expected for Windows, Mac and Linux, either you do it with Matlab or other languages such as C.

Yuval Harpaz
  • 1,416
  • 1
  • 12
  • 16
  • Right now I no longer understand MATLAB. Once I type in the commands with the `zeros` the whole virtual memory is actually allocated, but no memory is used. `whos` shows the same size for both matrices, while my os shows a different memory consumption. I deleted my comment because your answer is definitely not wrong. – Daniel Jan 02 '20 at 14:29
  • 3
    I found something explaining this: https://stackoverflow.com/questions/51987892/we-need-to-preallocate-but-matlab-does-not-preallocate-the-preallocation – Daniel Jan 02 '20 at 14:35
  • @Gregor: I guess to confirm this, try it with `ones` instead of `zeros`, this makes sure the memory is actually allocated at the time of calling the respective function. – Daniel Jan 02 '20 at 21:10
  • When I understand everything right, the conclusion is: There is no temporary copy. The out of memory exceptions arise because `main_mat` is assigned nonzero values. Previously only virtual memory (address space) was assigned, this is now assigned to physical memory. – Daniel Jan 05 '20 at 13:51
1

Removing parfor will likely fix your problem.

parfor is not useful there. MATLAB's parfor does not use shared memory parallelism (i.e. it doesn't start new threads) but rather distributed memory parallelism (it starts new processes). It is designed to distribute work over a set or worker nodes. And though it also works within one node (or a single desktop computer) to distribute work over multiple cores, it is not an optimal way of doing parallelism within one node.

This means that each of the processes started by parfor needs to have its own copy of slice_matrix, which is the cause of the large amount of memory used by your program.

See "Decide When to Use parfor" in the MATLAB documentation to learn more about parfor and when to use it.

Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
  • 1
    Is removing `parfor` _is the only way_? The processing works best when I designed it that way, since everything inside `parfor` is CPU and GPU intensive, thus it significantly improved the performance. – Gregor Isack Dec 16 '19 at 16:05
  • @GregorIsack: I went with your example code, didn't know you actually did a lot of work inside the `parfor`. If so, then yes, it is likely useful. -- Maybe if `slice_matrix` is not a `gpuarray` it won't be copied in the assignment. – Cris Luengo Dec 16 '19 at 16:14
  • Hmmm even if `slice_matrix` is not a `gpuArray`, I still getting overflow symptom. I'll let this question open, let's see if there are any alternative solution. Thanks for the answer though! – Gregor Isack Dec 16 '19 at 16:45
0

I assume that your code is just a sample code and that rand() represents a custom in your MVE. So there are a few hints and tricks for the memory usage in matlab.

There is a snippet from The MathWorks training handbooks:

When assigning one variable to another in MATLAB, as occurs when passing parameters into a function, MATLAB transparently creates a reference to that variable. MATLAB breaks the reference, and creates a copy of that variable, only when code modifies one or more of teh values. This behavior, known as copy-on-write, or lazy-copying, defers the cost of copying large data sets until the code modifies a values. Therefore, if the code performs no modifications, there is no need for extra memory space and execution time to copy variables.

The first thing to do would be to check the (memory) efficiency of your code. Even the code of excellent programmers can be futher optimized with (a little) brain power. Here are a few hints regarding memory efficiency

  • make use of the nativ vectorization of matlab, e.g. sum(X,2), mean(X,2), std(X,[],2)
  • make sure that matlab does not have to expand matrices (implicit expanding was changed recently). It might be more efficient to use the bsxfun
  • use in-place-operations, e.g. x = 2*x+3 rather than x = 2*x+3
  • ...

Be aware that optimum regarding memory usage is not the same as if you would want to reduce computation time. Therefore, you might want to consider reducing the number of workers or refrain from using the parfor-loop. (As parfor cannot use shared memory, there is no copy-on-write feature with using the Parallel Toolbox.

If you want to have a closer look at your memory, what is available and that can be used by Matlab, check out feature('memstats'). What is interesting for you is the Virtual Memory that is

Total and available memory associated with the whole MATLAB process. It is limited by processor architecture and operating system. or use this command [user,sys] = memory.

Quick side node: Matlab stores matrices consistently in memory. You need to have a large block of free RAM for large matrices. That is also the reason why you want to allocate variables, because changing them dynamically forces Matlab to copy the entire matrix to a larger spot in the RAM every time it outgrows the current spot.

If you really have memory issues, you might just want to dig into the art of data types -- as is required in lower level languages. E.g. you can cut your memory usage in half by using single-precision directly from the start main_mat=zeros(500,500,2000,'single'); -- btw, this also works with rand(...,'single') and more native functions -- although a few of the more sophisticated matlab functions require input of type double, which you can upcast again.

max
  • 3,915
  • 2
  • 9
  • 25
0

If I understand correctly your main issue is that parfor does not allow to share memory. Think of every parfor worker as almost a separate matlab instance.

There is basically just one workaround for this that I know (that I have never tried), that is 'shared matrix' on Fileexchange: https://ch.mathworks.com/matlabcentral/fileexchange/28572-sharedmatrix

More solutions: as others suggested: remove parfor is certainly one solution, get more ram, use tall arrays (that use harddrives when ram runs full, read here), divide operations in smaller chunks, last but not least, consider an alternative other than Matlab.

user2305193
  • 2,079
  • 18
  • 39
0

You may use following code. You actually don't need the slice_matrix

main_mat=zeros(500,500,2000);
n=500;
slice_matrix=zeros(500,500,n);
for k=1:4
   parfor i=1:n
       main_mat(:,:,1+(k-1)*n + i - 1) = gather(gpuArray(rand(500,500)));
   end
   %% now you don't need this main_mat(:,:,1+(k-1)*n:1+(k-1)*n+n-1)=slice_matrix; %This is where the memory will likely overflow
end
Mayank Kumar Chaudhari
  • 16,027
  • 10
  • 55
  • 122