Octave parallel function worsens the running time on a single machine

Question

I tried to create a test code by octave to evaluate the time efficiency on my Windows machine with an 8-core processor (parallelization on a single machine); starting with the simple code example provided in the documentation, as follows:

pkg load parallel
fun = @(x) x^2;
vector_x = 1:20000;

# Serial Format of the Program
tic()
for i=1:20000
vector_y(1,i)=vector_x(1,i)^2;   
endfor
toc()

# Parallel Format of the Program
tic()
vector_y1 = pararrayfun(nproc, fun, vector_x);
toc()

To my surprise, the time required for a serial code is much faster than using the parallel function. The serial case ran in 0.0758219 s, the parallel one in 3.79864 s.

Would someone explain me if it is a parallel overhead or I should set up something in my Octave setting, or in which cases is the parallization really helpful?

Likely just overhead. Starting a parallel setting takes time. Not sure if Octave has it, but in MATLAB you can force the "parallel pool" to start prior to the computation (which often takes more than 4s), so you can avoid computing the time it takes to set it up. That said, if your single CPU code takes `0.07` maybe it wont be much faster in parallel, considering that (at least MATLAB) has openMP implementations of standard operations. — Ander Biguri, Feb 08 '21 at 12:04

Adriaan · Answer 1 · 2021-02-08T12:48:32.130

TL;DR: open up your pool outside the timer and choose a more difficult operation.

There's two main issues. One is what Ander mentioned in his comment, starting up the parallel pool takes a second or two. You can open up it beforehand (in MATLAB you can do this via parpool) to speed that up. Alternatively, run a single parallel operation, thus opening the pool, and then redo the timing.

The second issue is the simplicity of your operation. Just squaring a number cannot go much faster than it already goes in serial. There's no point in passing data back-and-forth between workers for such a simple operation. Redo your test with a more expensive function, e.g. eig() as MATLAB does in their examples.

Parallellisation is thus useful if the runtime of your operations greatly outweighs the overhead of passing data to and from workers. Basically this means you either have a very large data set, which you need to perform the same operation on each item (e.g. taking the mean of every 1000 rows or so), or you have a few heavy, but independent, tasks to perform.

For a more in-depth explanation I can recommend this answer of mine and references therein.

Just as a sidenote, I'm surprised your serial for is that fast, given that you do not initialise your output vector. Preallocation is very important, as "growing" arrays in loops requires the creation of a new array and copying all previous content to it every iteration.
You also might want to consider not using i or j as variable names, as they denote the imaginary unit. It won't affect runtime much, but can result in very hard to debug errors. Simply use idx, ii, or a more descriptive variable name.

I see this "don't use i" advice from time to time, but I don't agree with it. `i` as a loop index in simple for loops is pretty standard notation, and `ii` just looks ugly. Octave is smart enough to treat the imaginary unit in the context of complex numbers appropriately without confusion. — Tasos Papastylianou, Feb 08 '21 at 22:04
@TasosPapastylianou sure, and for simple loops I do use `idx` or `ii`. The problem with using `i`, IMO, is that you get used to it and will also use it in larger loops. That's the point where you're getting into trouble with these hard-to-debug errors. As the linked Q/A says: indeed, there is no problem with using them, but it may lead to such errors. I rather avoid them upfront, than to attempt debugging it later. — Adriaan, Feb 09 '21 at 08:11

Octave parallel function worsens the running time on a single machine

1 Answers1