Why this multiprocessing code is slower than the serial one?

Question

I tried the following python programs, both sequential and parallel versions on a cluster computing facility. I could clearly see(using top command) more processes initiating for the parallel program. But when I time it, it seems the parallel version is taking more time. What could be the reason? I am attaching the codes and the timing info herewith.

#parallel.py
from multiprocessing import Pool
import numpy
def sqrt(x):
 return numpy.sqrt(x)
pool = Pool()
results = pool.map(sqrt, range(100000), chunksize=10)

#seq.py
import numpy
def sqrt(x):
 return numpy.sqrt(x)
results = [sqrt(x) for x in range(100000)]

user@domain$ time python parallel.py > parallel.txt
real    0m1.323s
user    0m2.238s
sys     0m0.243s

user@domain$ time python seq.py > seq.txt
real    0m0.348s
user    0m0.324s
sys     0m0.024s

MPI != multiprocessing! You should specify much more information about the system you are using this code on. — Zulan, May 15 '17 at 17:27
The cluster is having one master node and two other compute nodes. Each node has 6 cores. For the parallel version the CPU usage is going above 130% — Sajil C K, May 15 '17 at 17:49
Python multiprocessing [does not distribute work among multiple nodes](https://stackoverflow.com/questions/5181949/using-the-multiprocessing-module-for-cluster-computing). — Zulan, May 15 '17 at 17:56

Zulan · Answer 1 · 2017-05-17T07:08:58.637

2

The amount of work per task is by far too little to compensate for the work-distribution-overhead. First you should increase the chunksize, but still a single square root operation is too short to compensate for the cost of sending around the data between processes. You can see an effective speedup from something like this:

def sqrt(x):
  for _ in range(100):
    x = numpy.sqrt(x)
  return x
results = pool.map(sqrt, range(10000), chunksize=100)

edited May 17 '17 at 07:08

answered May 15 '17 at 17:35

Zulan

21,896
6
49
109

This code is giving expected outputs from multiprocessing import Pool import numpy def sqrt(x): for _ in range(100): x = numpy.sqrt(x) return x pool = Pool() results = pool.map(sqrt, range(10000), chunksize=1000) Serial 1.121 sec, parallel 0.955 sec – Sajil C K May 15 '17 at 18:19

Why this multiprocessing code is slower than the serial one?

1 Answers1

Linked