0

I am trying to parallelize a piece of code using the multiprocessing library, the outlines of which are as follows:

import multiprocessing as mp
import time
def stats(a,b,c,d,queue):
   t = time.time()
   x = stuff
   print(f'runs in {time.time() -t} seconds')
   queue.put(x)

def stats_wrapper(a,b,c,d):
   q = mp.Queue()
   # b here is a dictionary
   processes = [mp.Process(target = stats,args = (a,{b1:b[b1]},c,d,q) for b1 in b]
   for p in processes:
     p.start()
   results = []
   for p in processes:
     results.append(q.get())
   return(results)

The idea here is to parallelize the calculations in stuff by creating as many processes as there are key:value pairs in b,which can go upto 50. The speedup I'm getting is only about 20% as compared to the serial implementation.

I tried rewriting the stats_wrapper function with mp.Pool , but couldn't get the apply.async function to work with different args as I have created in stats_wrapper above.

Questions :-

  1. How can I improve the stats_wrapper function to bring down the runtime further?
  2. Is there a way to rewrite the stats_wrapper using mp.Pool function that can accept a different value for parameter b in the stats function for different processes? Something like :

    def pool_stats_wrapper(a,b,c,d):
        q = mp.Queue
        pool = mp.Pool()
    
        pool.apply_async(stats,((a,{b1:b[b1},c,d) for b1 in b)) # Haven't figured this part yet
        pool.close()
        results = []
        for _ in len(b):
           results.append(q.get())
        return (results)
    

I'm running the code on an 8 core 16 GB machine if that helps.

0 Answers0