So, I have a problem which I assume must be common:
I'd like to parallelize a script with a multiprocessing.Pool
, handing inputs to the pool, having it process them in parallel, and receive the outputs in the parent process.
apply_async()
looks like the best fit for what I want to do. But I can't just give a callback function, since in the end I want to print all the results to a single file. I think handing it a callback which prints to a single filehandle will result in jumbled results (not even sure I can pass a filehandle between processes like that).
So how's the best way to submit inputs to the Pool
, then receive the outputs and handle them in the main process? At the moment I'm just collecting the AsyncResult
objects in a list and periodically iterating through it, calling a .get()
method on each.
Update
I'll clarify a few parameters of my problem in response to comments:
@martineau and @Juggernaut: By not "jumbled" I mean I'd really like to preserve the order of the input so that the output is in the same order.
@RolandSmith and @martineau: My main process is just reading inputs from a file, handing them to the pool, and printing the results. I could just call
.apply()
, but then the main process is waiting for the function to complete before it proceeds. I'm usingmultiprocessing
to reap the benefits of parallelization and have many inputs processed simultaneously.