Embarassingly parallel task not fast enough

Question

I have seen other related questions (like this one) but none of them actually answers my questions, so here it goes:

I have an obviously embarassingly parallel task to perform, my own rolled version of GridSearch. In simple words I have a set of parameters and want to evaluate my model on each of those. There is no dependence between those runs so the code looks like that:

pool = multiprocessing.Pool(processes=4)
scores = pool.map(evaluator, permutations)

where the evaluator is a function that computes a score given a dict of parameters, and permutations is a list of such dictionaries (of length 4 in this case).

Now my assumption is that using 4 processes (on an 8 core machine) should give me a 4x speedup (note that the evaluator takes the same amount of time regardless of the set of parameters so the load is perfectly balanced).

Instead my timing has yielded those results:

Using 4 processes, each evaluation takes 82 sec to complete, as a result the total time is 84 sec.
Using 1 process, each evaluation takes 43 sec to complete, as a result the total time is 170 sec.

So in the end I get a 2x speedup using 4 cores. Why is each process faster when there are fewer processes?

Try using `concurrent.futures`using the pattern described here https://stackoverflow.com/questions/48492459/running-multiple-functions-that-make-http-requests-in-parallel/48492882#48492882 — dmitryro, Jan 29 '18 at 14:52
I am more interested in why my current approach does not work that how I can improve it. I will check the link anyways, thanks :) — pilu, Jan 29 '18 at 14:57
Parallel processing is not magic, it incurs in some overhead to distribute the load in the different processes and the processes are still handled by the OS, so other processes can interfeer. Try running more complex/simpler tasks and you will see how this effect is substantially reduced when the complexity increases. — Adirio, Jan 29 '18 at 14:58
It could be a number of factors, your CPU, your server, your disk space and more. In some cases multiprocessing is not appropriate, and threads are the way to go. Welcome. — dmitryro, Jan 29 '18 at 14:59
You can try to see how much of the time the CPU cores are actually loaded. Maybe `permutations` take a lot of time to pass to the processes. — Lev Levitsky, Jan 29 '18 at 15:03
Creates a model using the parameters passed, trains the model and evalutes it using cross validation. — pilu, Jan 30 '18 at 10:46

Embarassingly parallel task not fast enough

0 Answers0