Why is parfor slow despite slicing?

Question

I have a simple parfor loop given below.

% fileAddr is a cell array of (size N) of file-addresses
sIdx = nan(N,1);
eIdx = nan(N,1);
errMsg = cell(N,1);
parfor i=1:N
    [sIdx(i),eIdx(i),errMsg{i}] = myFunk(fileAddr{i});
end

The function file myFun() loads a file given by fileAddr{i}, makes some calculations and returns results. The file loading part is the most time consuming. My machine has 4 physical cores. I tried parfor() with a pool o of 1,2,3 and 4 workers. Every time, the time consumption is in similar ballpark. My understanding was that if more than one worker is load()ing the files in parallel, the program would run faster but the profiler results show otherwise.

Can anyone please explain where am I making a mistake?

[`parfor()` is ***NOT*** a magic wand](https://stackoverflow.com/q/32146555/5211833), don't treat it as such. If file loading is indeed the bottleneck of this operation, no parallelisation is going to help you speed up the code. HDD/SSD both have a finite read-speed, and if you maximise that you can't get faster. Parallellisation can only potentially help you when it's the *computations* that are the bottleneck. — Adriaan, Jul 26 '18 at 15:59

Ander Biguri · Accepted Answer · 2018-07-26T16:05:38.737

5

You only have 1 hard drive. Only 1 worker can read from it at a time (its a speeding disk with a magnetic head!). Its slower because the workers are waiting for their turn for the HDD so you win no time. Add to that all the overheard of the data sending and sharing and you make it slower.

Have you tried spmd? But I suspect it will end up in the same result you have with parfor.

edited Jul 26 '18 at 16:05

answered Jul 26 '18 at 15:52

Ander Biguri

35,140
11
74
120

So the only chance of making it any faster is by replacing the HDD with an SSD ? – Abhinav Jul 26 '18 at 15:58
@Abhinav no, SSDs have the same limitation. It will be faster because SSDs are faster, not because of the `parfor` – Ander Biguri Jul 26 '18 at 15:59
No, I haven't tried `spmd' but I imagine it will meet the same fate as every iteration is dependent on a different file. I can try it though. – Abhinav Jul 26 '18 at 16:04
@Abhinav yes I suspect you will encounter the same issue. You have a data loading bottleneck. – Ander Biguri Jul 26 '18 at 16:05
@Abhinav: put the files on different physical disks. – Cris Luengo Jul 26 '18 at 16:05
1

@CrisLuengo Putting data on different disks.. hmm good suggestion. The data is currently put on two 4TB disks. But the iterator filenames are put in sequence, i.e. all files of disk-1 first, followed by all files of disk-2. May be if I put files in alternative order (disk-1 file, then disk-2 file etc ), then it should eek out some saving. – Abhinav Jul 26 '18 at 16:10
3

@AnderBiguri: SSDs work a whole lot better with parallel tasks than rotating disks, because they don't have to seek. For a rotating disk, `parfor` suffers not just the overhead of thread synchronization (maybe 3-5% slowdown), it's the time wasted moving the physical heads back and forth between files (maybe 30,000-500,000% slowdown). SSDs do not have THIS limitation (they still have limited random access speed, but it's pretty close to sequential access speed). – Ben Voigt Jul 26 '18 at 16:15
@BenVoigt yes, that is what I meant with the comment. It will be faster because of the SSDs, but not because of the parfor. I assume 4 sequential reads are equally as fast, if a good compiler, (JIT or otherwise) exists. – Ander Biguri Jul 26 '18 at 16:20

Why is parfor slow despite slicing?

1 Answers1