I am trying to parallelize a piece of code using the multiprocessing
library, the outlines of which are as follows:
import multiprocessing as mp
import time
def stats(a,b,c,d,queue):
t = time.time()
x = stuff
print(f'runs in {time.time() -t} seconds')
queue.put(x)
def stats_wrapper(a,b,c,d):
q = mp.Queue()
# b here is a dictionary
processes = [mp.Process(target = stats,args = (a,{b1:b[b1]},c,d,q) for b1 in b]
for p in processes:
p.start()
results = []
for p in processes:
results.append(q.get())
return(results)
The idea here is to parallelize the calculations in stuff
by creating as many processes as there are key:value pairs in b
,which can go upto 50. The speedup I'm getting is only about 20% as compared to the serial implementation.
I tried rewriting the stats_wrapper
function with mp.Pool
, but couldn't get the apply.async
function to work with different args
as I have created in stats_wrapper
above.
Questions :-
- How can I improve the
stats_wrapper
function to bring down the runtime further? Is there a way to rewrite the
stats_wrapper
usingmp.Pool
function that can accept a different value for parameterb
in thestats
function for different processes? Something like :def pool_stats_wrapper(a,b,c,d): q = mp.Queue pool = mp.Pool() pool.apply_async(stats,((a,{b1:b[b1},c,d) for b1 in b)) # Haven't figured this part yet pool.close() results = [] for _ in len(b): results.append(q.get()) return (results)
I'm running the code on an 8 core 16 GB machine if that helps.