Python multiprocessing: no performance gain with multiple processes

Question

Using multiprocessing, I tried to parallelize a function but I have no performance improvement:

from MMTK import *
from MMTK.Trajectory import Trajectory, TrajectoryOutput, SnapshotGenerator
from MMTK.Proteins import Protein, PeptideChain
import numpy as np

filename = 'traj_prot_nojump.nc'

trajectory = Trajectory(None, filename)

def calpha_2dmap_mult(trajectory = trajectory, t = range(0,len(trajectory))):
    dist = []
    universe = trajectory.universe
    proteins = universe.objectList(Protein)
    chain = proteins[0][0]
    traj = trajectory[t]
    dt = 1000 # calculate distance every 1000 steps
    for n, step in enumerate(traj):
        if n % dt == 0:
            universe.setConfiguration(step['configuration'])
            for i in np.arange(len(chain)-1):
                for j in np.arange(len(chain)-1):
                    dist.append(universe.distance(chain[i].peptide.C_alpha,
                                                  chain[j].peptide.C_alpha))
    return(dist)

c0 = time.time()
dist1 = calpha_2dmap_mult(trajectory, range(0,11001))
c1 = time.time() - c0
print(c1)


# Multiprocessing
from multiprocessing import Pool, cpu_count

pool = Pool(processes=4)
c0 = time.time()
dist_pool = [pool.apply(calpha_2dmap_mult, args=(trajectory, t,)) for t in
             [range(0,2001), range(3000,5001), range(6000,8001),
              range(9000,11001)]]
c1 = time.time() - c0
print(c1)

The time spent to calculate the distances is the 'same' without (70.1s) or with multiprocessing (70.2s)! I was maybe not expecting an improvement of a factor 4 but I was at least expecting some improvements! Is someone knows what I did wrong?

Note: Because of the GIL, doing CPU heavy work in threads often doesn't work as expected in Python. https://wiki.python.org/moin/GlobalInterpreterLock — Aaron Digulla, Oct 14 '14 at 09:32
@AaronDigulla That is why "multiprocessing is a package that supports spawning processes" — user2864740, Oct 14 '14 at 21:14

user2864740 · Answer 1 · 2014-10-15T19:49:54.883

4

Pool.apply is a blocking operation:

[Pool.apply is the] equivalent of the apply() built-in function. It blocks until the result is ready, so apply_async() is better suited for performing work in parallel ..

In this case Pool.map is likely more appropriate for collecting the results; the map itself blocks but the sequence elements / transformations are processed in parallel.

It addition to using partial application (or manual realization of such), also consider expanding the data itself. It's the same cat in a different skin.

data = ((trajectory, r) for r in [range(0,2001), ..])
result = pool.map(.., data)

This can in turn be expanded:

def apply_data(d):
    return calpha_2dmap_mult(*d)

result = pool.map(apply_data, data)

The function (or simple argument-expanded proxy of such of such) will need to be written to accept a single argument but all the data is now mapped as a single unit.

edited Oct 15 '14 at 19:49

answered Oct 14 '14 at 09:12

user2864740

60,010
15
145
220

I cannot use apply_async() because I want the the results in order. I thought Pool.apply would be equivalent to Pool.map but with the advantage to allow several arguments. How can I use Pool.map with several arguments? – guillaume Oct 14 '14 at 15:28
1

@guillaume You can still get the results in order with `apply_async`. Just do: `final_dist_pool = [r.get() for r in dist_pool]` after the initial calls to `apply_async`. However, if you want to use `map` with multiple args instead, you can use `functools.partial` to enable passing multiple arguments. See [here](http://stackoverflow.com/questions/25553919/passing-multiple-parameters-to-pool-map-function-in-python/25553970#25553970). – dano Oct 14 '14 at 16:00
Thank you very much for your answers. You were right, `apply_async` works like a charm and gives good performance (~20s Vs. ~70s) and using `[r.get() ...] ` also the results in order. @dano Thanks for the hint about `functools.partial` with `map`, it gives similar performance compared to `apply_sync`. – guillaume Oct 15 '14 at 14:55
Note that you can't use a `lambda` function with `Pool.map`. You have to use a function defined at the top level of the module, or a `functools.partial` (which must consume a function declared at the top-level of the module). – dano Oct 15 '14 at 14:58
@user2864740 The `lambda` function won't pickle/unpickle properly. – dano Oct 15 '14 at 19:48
@dano Magic still gets me - thought it was only the data subject to such :< It's what I get for not being hands-on. I've updated the example, thanks for the correction(s). – user2864740 Oct 15 '14 at 19:50
@guillaume another alternative to use several arguments is `Pool.starmap` which unpacks a single tuple into its multiple parts (as in `*args`) – AngryUbuntuNerd Jan 17 '22 at 18:23

Python multiprocessing: no performance gain with multiple processes

1 Answers1

Linked