1

I'm new to parallel processing, here's my problem:

I have a big data variable that cannot fit twice in RAM. Therefore, this won't work:

for ind=1:4
  data{ind}=load_data(ind);
end

parfor ind=1:4
  process_longtime(data{ind});
end

As there's a memory overflow. My hypothesis is, that Matlab tries to copy the whole data variable to every worker.

If this is correct - is there a way to distribute data into 4 (or n) parts to the workers, so they do not need access to the whole data variable?

user2305193
  • 2,079
  • 18
  • 39
  • 2
    See the duplicate; `data` is indeed broadcast to all workers, i.e. multiplied in RAM. There're several solutions to this, as the answers on the duplicate point out. Additionally you might want to read [this question](https://stackoverflow.com/q/32146555/5211833), especially the second answer, on doing a "sliced" `parfor`. – Adriaan Mar 28 '18 at 10:42
  • 1
    thanks for helping me find the answers, it seems like a simple problem at first, but it really is complex... – user2305193 Mar 28 '18 at 11:01
  • 3
    Something to consider when working with variables that can be larger than the available memory: [Tall arrays](https://www.mathworks.com/help/matlab/tall-arrays.html). – Dev-iL Mar 28 '18 at 11:08
  • 1
    If you're looking for a quick, and slightly dirty, solution: slice your data first into smaller variables, and store those on disk (don't use dynamic variable names, use dynamic file names), and have each `parfor` iteration load only the required slice. This is a conceptually easy solution, which is easy and quick to implement, whilst giving a considerable speed-up. If you're planning to do more parallel stuff in the future and you have the time now: study the answers in the duplicate and the question I linked, and try slicing your variables proper. – Adriaan Mar 28 '18 at 11:13
  • I am looking for a quick&dirty solution, but I'm low on local server-diskspace unfortunaltely. On another note, I still haven't quite understood why my data variable is not sliced. I do use it via a cell array, and within the loop it's just accessed with `data{ind}`. I think the slicing would be quick&dirtiest, `parfeval` solutions didn't really get me good speedup... – user2305193 Mar 28 '18 at 11:19
  • The problem is that a cell variable is not a sliced variable, i.e. the whole cell gets send over to each worker, that was what I was initially doing in the duplicate target question as well. In my opinion this is a design flaw in the architecture of MATLAB's parallel toolbox, but that's how it is. You'll need to manually slice your variables, see the [documentation](https://mathworks.com/help/distcomp/sliced-variable.html). – Adriaan Mar 28 '18 at 11:21
  • thanks for clearing that up, I didn't go into depth with the documentation, I presumed cell arrays are automatically treated as sliced, as long as you don't access any index out of the 'current loop'. – user2305193 Mar 28 '18 at 11:24

0 Answers0