8

I am trying to create a piece of parallel code to speed up the processing of a very large (couple of hundred million rows) array. In order to parallelise this, I chopped my data into 8 (my number of cores) pieces and tried sending each worker 1 piece. Looking at my RAM usage however, it seems each piece is send to each worker, effectively multiplying my RAM usage by 8. A minimum working example:

A = 1:16;
for ii = 1:8
    data{ii} = A(2*ii-1:2*ii);
end

Now, when I send this data to workers using parfor it seems to send the full cell instead of just the desired piece:

output = cell(1,8);
parfor ii = 1:8
    output{ii} = data{ii};
end

I actually use some function within the parfor loop, but this illustrates the case. Does MATLAB actually send the full cell data to each worker, and if so, how to make it send only the desired piece?

Adriaan
  • 17,741
  • 7
  • 42
  • 75
  • 4
    If your data is a [sliced variable](http://mathworks.com/help/distcomp/sliced-variables.html) it will be "sliced" and only those slices will be transmitted to the workers; are you using sliced variables in your real code? – m.s. Aug 19 '15 at 12:45
  • I'm using a cell array in my actual code, as presented here. I'll look into the sliced variable function, thanks. – Adriaan Aug 19 '15 at 12:46
  • Maybe do slicing manually, submitting individual jobs for each piece: http://de.mathworks.com/help/distcomp/submit.html – Daniel Aug 19 '15 at 12:47
  • Note: the `gather` after the `parfor` loop here is redundant - `gather` is used to convert a `distributed` array into a regular MATLAB array. – Edric Aug 20 '15 at 09:47
  • Slicing your variables may definitely be the way to go. We can't help you a lot on telling you if you are slicing it right or not without seeing your code tho. – Ikaros Sep 01 '15 at 08:45
  • @HamtaroWarrior I'm splitting it in exactly the way I presented. I have an `Mx7` array with `M` in the order of hundreds of millions, and chop that into eight (my number of cores) pieces, then store each piece in a cell, just as presented. – Adriaan Sep 01 '15 at 13:25

3 Answers3

11

In my personal experience, I found that using parfeval is better regarding memory usage than parfor. In addition, your problem seems to be more breakable, so you can use parfeval for submitting more smaller jobs to MATLAB workers.

Let's say that you have workerCnt MATLAB workers to which you are gonna handle jobCnt jobs. Let data be a cell array of size jobCnt x 1, and each of its elements corresponds to a data input for function getOutput which does the analysis on data. The results are then stored in cell array output of size jobCnt x 1.

in the following code, jobs are assigned in the first for loop and the results are retrieved in the second while loop. The boolean variable doneJobs indicates which job is done.

poolObj = parpool(workerCnt);
jobCnt = length(data); % number of jobs
output = cell(jobCnt,1);
for jobNo = 1:jobCnt
    future(jobNo) = parfeval(poolObj,@getOutput,...
        nargout('getOutput'),data{jobNo});
end
doneJobs = false(jobCnt,1);
while ~all(doneJobs)
    [idx,result] = fetchnext(future);
    output{idx} = result;
    doneJobs(idx) = true;
end

Also, you can take this approach one step further if you want to save up more memory. What you could do is that after fetching the results of a done job, you can delete the corresponding member of future. The reason is that this object stores all the input and output data of getOutput function which probably is going to be huge. But you need to be careful, as deleting members of future results index shift.

The following is the code I wrote for this porpuse.

poolObj = parpool(workerCnt);
jobCnt = length(data); % number of jobs
output = cell(jobCnt,1);
for jobNo = 1:jobCnt
    future(jobNo) = parfeval(poolObj,@getOutput,...
        nargout('getOutput'),data{jobNo});
end
doneJobs = false(jobCnt,1);
while ~all(doneJobs)
    [idx,result] = fetchnext(future);
    furure(idx) = []; % remove the done future object
    oldIdx = 0;
    % find the index offset and correct index accordingly
    while oldIdx ~= idx
        doneJobsInIdxRange = sum(doneJobs((oldIdx + 1):idx));
        oldIdx = idx
        idx = idx + doneJobsInIdxRange;
    end
    output{idx} = result;
    doneJobs(idx) = true;
end
milaniez
  • 1,051
  • 1
  • 9
  • 21
5

The comment from @m.s is correct - when parfor slices an array, then each worker is sent only the slice necessary for the loop iterations that it is working on. However, you might well see the RAM usage increase beyond what you originally expect as unfortunately copies of the data are required as it is passed from the client to the workers via the parfor communication mechanism.

If you need the data only on the workers, then the best solution is to create/load/access it only on the workers if possible. It sounds like you're after data parallelism rather than task parallelism, for which spmd is indeed a better fit (as @Kostas suggests).

Edric
  • 23,676
  • 2
  • 38
  • 40
3

I would suggest to use the spmd command of MATLAB.

You can write code almost as it would be for a non-parallel implementation and also have access to the current worker by the labindex "system" variable.

Have a look here:

http://www.mathworks.com/help/distcomp/spmd.html

And also at this SO question about spmd vs parfor:

SPMD vs. Parfor

Community
  • 1
  • 1
Xxxo
  • 1,784
  • 1
  • 15
  • 24
  • Whilst using SPMD might help the code, it circumvents the question rather than answering it. I want to know how to properly send my data to workers without ridiculous overhead, so I can use it in other pieces of code as well. – Adriaan Aug 19 '15 at 13:43
  • Using spmd or parfor does not matter in terms of RAM usage, but the code using spmd takes ~1.2 times longer than the parfor implementation. – Adriaan Aug 19 '15 at 13:47
  • If you use the same logic as in the provided example, then you have something wrong in your code. Since it would require an almost debug of your code to see the actual error, I suggested a quick alternative to solve your problem. If you specificaly require parfor, then you should post the whole code. – Xxxo Aug 19 '15 at 21:24
  • I do not get an error, I merely suspect the data gets send to all workers and thereby multiplied. The question thus is not to debug my code, but to understand the working's of how matlab distributes data over workers. – Adriaan Aug 20 '15 at 06:23