Python multiprocessing multiple processes per thread

Question

I'm currently using python's multiprocessing module with a pool to run a function millions of times, simultaneously. While multiprocessing works well, the function is so lightweight that barely 30% of each core is used and threads are maxed out only during the Locking(). Looking at my script's profile, indeed locking is the most expensive.

Given each function run is very short, the trade off between locking each time I'm mapping to the function and running the function is not worth it (in fact I'm getting better performance by running it serially; 15 mins parallelized vs. 4.5 mins serial).

The function writes to independent files so calls are completely independent. Is it possible to 'mimic' running/calling the same parallelized python script multiple times (with different inputs) to make more use of the CPU?

Current Code:

pool = Pool(cpu_count(), initializer=tqdm.tqdm.set_lock, initargs=(Lock(),))

for _ in tqdm.tqdm(pool.imap_unordered(parallel_process, pubfiles, chunksize=70), total=nfiles, desc='Parsing files'):
  pass

EDIT:

To ensure it has nothing to do with tqdm's locking, modifying the code to the following achieves the same issue:

    pool = Pool(cpu_count())
    for i in pool.imap_unordered(parallel_process, files, chunksize=70):
        print(i)

I've profiled my code for a while and most expensive processes seem to be related to locking (?)/multiprocessing in general. The actual function is very close to the bottom of the processing time.

Maybe look at the map/imap stuff (https://docs.python.org/2/library/multiprocessing.html#multiprocessing.pool.multiprocessing.Pool.map) with a largish chunksize? — Tom Dalton, May 17 '18 at 15:30
I have a total of 900 files which I'm parallelizing. Setting chunksize to 900 (so it locks threads once in the beginning), still uses 18% of the total cpu. Locking however uses 100% of all cores/threads for a couple seconds in the initial part of the run (confirming multiprocessing is working). — dter, May 17 '18 at 15:37
chunksize should be at most number of items / number of workers. The default chunksize is 1/4 of that number (so each worker gets 4 sets on average). If you are processing at the file level, all 900 went to one process and the others are idle. — tdelaney, May 17 '18 at 15:49
Which lock are you referring to? Is it in the multiprocessing queue? Do you use a `Manager` (which can be expensive)? — tdelaney, May 17 '18 at 15:50
Thanks tdelaney for chiming in. After Tom's comment I came across this: https://stackoverflow.com/questions/3822512/chunksize-parameter-in-pythons-multiprocessing-pool-map and I set it to 900/12 (no. of threads) to around 70 and get 0.8% from each python process. — dter, May 17 '18 at 15:51
I'm using tqdm for a progress bar so I'm passing Lock() as an initial argument i.e. pool = Pool(cpu_count(), initializer=tqdm.tqdm.set_lock, initargs=(Lock(),)). When the code hits that line, cpu gets maxed out but drops right after. — dter, May 17 '18 at 15:52
Ensured tqdm was not causing the locking issue and edited the code in the OP — dter, May 17 '18 at 16:12

score 0 · Accepted Answer · answered May 27 '18 at 12:46

This issue had nothing to do with multiprocessing - my functions were IO-bound because each call was reading and writing a file to disk and this was a bottleneck when running in parallel. Chunking and reducing the number of files written reduced this bottleneck and the above multiprocessing code (parallelism) worked perfectly fine!

Python multiprocessing multiple processes per thread

1 Answers1