I'm trying to run some Python functions in parallel, which has print commands throughout the function. What I want is to have each subprocess running the same function, to output to the main stdout in a grouped manner. What I mean by that is that I want each subprocess's output to only be printed after it has finished completing its task. If, however, some kind of error occured during this process, I want to still output whatever was done in the subprocess.
A small example:
from time import sleep
import multiprocessing as mp
def foo(x):
print('foo')
for i in range(5):
print('Process {}: in foo {}'.format(x, i))
sleep(0.5)
if __name__ == '__main__':
pool = mp.Pool()
jobs = []
for i in range(4):
job = pool.apply_async(foo, args=[i])
jobs.append(job)
for job in jobs:
job.wait()
This runs in parallel, but what is outputted is:
foo
Process 0: in foo 0
foo
Process 1: in foo 0
foo
Process 2: in foo 0
foo
Process 3: in foo 0
Process 1: in foo 1
Process 0: in foo 1
Process 2: in foo 1
Process 3: in foo 1
Process 1: in foo 2
Process 0: in foo 2
Process 2: in foo 2
Process 3: in foo 2
Process 1: in foo 3
Process 0: in foo 3
Process 3: in foo 3
Process 2: in foo 3
Process 1: in foo 4
Process 0: in foo 4
Process 3: in foo 4
Process 2: in foo 4
What I want is:
foo
Process 3: in foo 0
Process 3: in foo 1
Process 3: in foo 2
Process 3: in foo 3
Process 3: in foo 4
foo
Process 1: in foo 0
Process 1: in foo 1
Process 1: in foo 2
Process 1: in foo 3
Process 1: in foo 4
foo
Process 0: in foo 0
Process 0: in foo 1
Process 0: in foo 2
Process 0: in foo 3
Process 0: in foo 4
foo
Process 2: in foo 0
Process 2: in foo 1
Process 2: in foo 2
Process 2: in foo 3
Process 2: in foo 4
It doesn't matter the particular order of either process, as long as each output is grouped together for each subprocess. Interestingly enough, I get my desired output if I do
python test.py > output
I know that each subprocess do not get their own stdout, instead they use the main stdout. I've thought and looked up some solutions to this, such as making it so that we use a Queue, and each subprocess has its own stdout, and then when it's done, we override the flush command so that we can output the output back to the Queue. After that, we can read the contents. However, although this does satisfy what I want, I cannot retrieve the output if the function stopped halfway through. It will only output when it has successfully completed. Got it from here Access standard output of a sub process in python
I've also seen the usage of locks, which works, but it completely kills running the function in parallel, since it'd have to wait for each subprocess to function executing the function foo.
Also, if possible, I'd like to avoid changing the implementation of my foo function, as I have many functions that I would need to change.
EDIT: I have looked into the libraries dispy and parallel python. Dispy does exactly what I want, where it has a separate stdout/stderr that I can just print out at the end, but the problem with dispy is that I have to manually run the server in a separate terminal. I want to be able to run my python program all in one go without having to first open another script. Parallel Python on the other hand, does what I want as well, but it seems to be lacking in the control you have over it, as well as some annoying nuisances to it. In particular, when you print out the output, it also prints out the return type of the function, I just want the output that I printed out using print. Also, when running a function, you have to give it a list of modules that it uses, this is slightly annoying, since I do not want to have to have a big list of imports just to run a simple function.