How can I parallelize a for loop in python using multiprocessing package?

Question

Note: I don't need any communication between the processes/threads, I'm interested in completion signal only (that's the reason I posted this question as a new one since all other examples I've found communicated between each other).

How can I use multiprocessing package in Python 3 to parallelize the following piece of code (the end goal is to make it run faster):

a = 123
b = 456
for id in ids: # len(ids) = 10'000
   # executes a binary with CLI flags
   run_binary_with_id(id, a, b) 
   # i.e. runs "./hello_world_exec --id id --a a --b b" which takes about 30 seconds on average

I tried the following:

import multiprocessing as mp

def run_binary_with_id(id, a, b):
    run_command('./hello_world_exec --id {} --a {} --b {}'.format(id, a, b))

if __name__ == '__main__':
    ctx = mp.get_context('spawn')
    q = ctx.Queue()
    a = 123
    b = 456
    ids = range(10000)
    for id in ids:
       p = ctx.Process(target=run_binary_with_id, args=(id,a,b))
       p.start()
    p.join()
    # The binary was executed len(ids) number of times, do other stuff assuming everything's completed at this point

or

for id in ids:
   map.apply_async(run_binary_with_id, (id,a,b))

In a similar question the answer is the following:

def consume(iterator):
    deque(iterator, max_len=0)
x=pool.imap_unordered(f,((i,j) for i in range(10000) for j in range(10000)))
consume(x)

which I don't really understand at all (why do I need this consume()).

blhsing · Accepted Answer · 2019-04-15T20:46:53.790

Trying to spawn 10000 processes to run in parallel is almost certainly going to overload your system and make it run slower than running the processes sequentially due to the overhead involved in the OS having to constantly perform context switching between processes when the number of processes far exceeds the number of CPUs/cores your system has.

You can instead use multiprocessing.Pool to limit the number of worker processes spawned for the task. The Pool constructor limits the number of processes to the number of cores your system has by default, but you can fine tune it if you'd like with the processes parameter. You can then use its map method to easily map a sequence of arguments to apply to a given function to run in parallel. It can only map one argument to the function, however, so you would have to use functools.partial to supply default values for the other arguments, which in your case do not change between calls:

from functools import partial
if __name__ == '__main__':
    _run_binary_with_id = partial(run_binary_with_id, a=123, b=456)
    with mp.Pool() as pool:
        pool.map(_run_binary_with_id, range(10000))

That makes sense, I've got one question though: can I explicitly replace `range(10000)` with `ids` in your code (1) and where do I pass `id` to `run_binary_with_id(id, a, b)` (2)? — James Larkin, Apr 15 '19 at 20:34
oh, I see, `id` was passed automatically, e.g. see the following piece of code: https://pastebin.com/M2ZEfKsP, does it make sense? — James Larkin, Apr 15 '19 at 20:41
Yes exactly. The values in the sequences are passed to the function as the first argument, so you don't have to specify the name of the argument. — blhsing, Apr 15 '19 at 20:42

How can I parallelize a for loop in python using multiprocessing package?

1 Answers1