2

I would like to fun a function using different arguments. For each different argument, I would like to run the function in parallel and then get the output of each run. It seems that the multiprocessing module can help here. I am not sure about the right steps to make this work.

Do I start all the processes, then get all the queues and then join all the processes in this order? Or do I get the results after I have joined? Or do I get the ith result after I have joined the ith process?

from numpy.random import uniform
from multiprocessing import Process, Queue

def function(x):
    return uniform(0.0, x)

if __name__ == "__main__":
    queue = Queue()
    processes = []
    x_values = [1.0, 10.0, 100.0]
    
    # Start all processes
    for x in x_values:
        process = Process(target=function, args=(x, queue, ))
        processes.append(process)
        process.start()

    # Grab results of the processes?
    outputs = [queue.get() for _ in range(len(x_values))]
    
    # Not even sure what this does but apparently it's needed
    for process in processes:
        process.join()
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
Euler_Salter
  • 3,271
  • 8
  • 33
  • 74
  • 2
    Could you expound on what the processes are doing? I they returning some values to you or are they involved with something else? Personally id use multiprocessing pools. Also note that if you spawn more processes than you have cores ...it doesnt really do anything. And pool to me is a little more intuitive than manually starting processes especially if you have a lot of x_values in your case. – Jason Chia Sep 02 '21 at 13:16
  • @JasonChia Thank you for your comment. Basically you can think of the function that I want to run as an experiment. I would like to run the experiment say 100 times in parallel and store the output (which is a dictionary, in my actual usecase). The reason why I am doing it is that I want to see how my experiment behaves on average but each experimental run takes about 1 hour, so I want to parallelize it. – Euler_Salter Sep 02 '21 at 21:34
  • @JasonChia Does it make sense? How would you use pools? If you could show me please, you would be my hero! – Euler_Salter Sep 02 '21 at 21:34

2 Answers2

1

So lets make a simple example for multiprocessing pools with a loaded function that sleeps for 3 seconds and returns the value passed to it(your parameter) and also the result of the function which is just doubling it. IIRC there's some issue with stopping pools cleanly

from multiprocessing import Pool
import time

def time_waster(val):
    try:
        time.sleep(3)
    
        return (val, val*2) #return a tuple here but you can use a dict as well with all your parameters
    except KeyboardInterrupt:
        raise KeyboardInterruptError()
        
if __name__ == '__main__':
    x = list(range(5)) #values to pass to the function
    results = []
    try:
        with Pool(2) as p: #I use 2 but you can use as many as you have cores
            results.append(p.map(time_waster,x))
    except KeyboardInterrupt:
        p.terminate()
    except Exception as e:
        p.terminate()
    finally:
        p.join()
    print(results)

As an extra service added some keyboardinterrupt handlers as IIRC there are some issues interrupting pools.https://stackoverflow.com/questions/1408356/keyboard-interrupts-with-pythons-multiprocessing-pool

Jason Chia
  • 1,144
  • 1
  • 5
  • 18
  • Thank you! I have a machine with 8 cores and have tried your solution with the input x of length `5` and of length `8` (keeping `Pool(8)`) but the times are quite different. This tells me it is not really doing anything in parallel – Euler_Salter Sep 07 '21 at 08:32
  • 1
    That's pretty much because it will take 3 seconds to run all 5 processes in your pool of 8. Plus overhead. If you compare with running in a for loop, you can very obviously see the benefit. You can try x with length 16 instead of 5. And you will get approximately 3*2 seconds + overhead. Whereas a for loop will go 16*3 seconds. – Jason Chia Sep 07 '21 at 12:22
0

proc.join() blocks until the process ended. queue.get() blocks until there is something in the queue. Because your processes don't put anything into the queue (in this example) than this code will never get beyond the queue.get() part... If your processes put something in the queue at the very end, then it doesn't matter if you first join() or get() because they happen at about the same time.

eagr
  • 58
  • 7