9

I'm using python multiprocessing module to parallelize some computationally heavy tasks. The obvious choice is to use a Pool of workers and then use the map method.

However, processes can fail. For instance, they may be silently killed for instance by the oom-killer. Therefore I would like to be able to retrieve the exit code of the processes launched with map.

Additionally, for logging purpose, I would like to be able to know the PID of the process launched to execute each value in the the iterable.

Mathieu Dubois
  • 1,054
  • 3
  • 14
  • 22

1 Answers1

10

If you're using multiprocessing.Pool.map you're generally not interested in the exit code of the sub-processes in the pool, you're interested in what value they returned from their work item. This is because under normal conditions, the processes in a Pool won't exit until you close/join the pool, so there's no exit codes to retrieve until all work is complete, and the Pool is about to be destroyed. Because of this, there is no public API to get the exit codes of those sub-processes.

Now, you're worried about exceptional conditions, where something out-of-band kills one of the sub-processes while it's doing work. If you hit an issue like this, you're probably going to run into some strange behavior. In fact, in my tests where I killed a process in a Pool while it was doing work as part of a map call, map never completed, because the killed process didn't complete. Python did, however, immediately launch a new process to replace the one I killed.

That said, you can get the pid of each process in your pool by accessing the multiprocessing.Process objects inside the pool directly, using the private _pool attribute:

pool = multiprocessing.Pool()
for proc in pool._pool:
  print proc.pid

So, one thing you could do to try to detect when a process had died unexpectedly (assuming you don't get stuck in a blocking call as a result). You can do this by examining the list of processes in the pool before and after making a call to map_async:

before = pool._pool[:]  # Make a copy of the list of Process objects in our pool
result = pool.map_async(func, iterable)  # Use map_async so we don't get stuck.
while not result.ready():  # Wait for the call to complete
    if any(proc.exitcode for proc in before):  # Abort if one of our original processes is dead.
        print "One of our processes has exited. Something probably went horribly wrong."
        break
    result.wait(timeout=1)
else:  # We'll enter this block if we don't reach `break` above.
    print result.get() # Actually fetch the result list here.

We have to make a copy of the list because when a process in the Pool dies, Python immediately replaces it with a new process, and removes the dead one from the list.

This worked for me in my tests, but because it's relying on a private attribute of the Pool object (_pool) it's risky to use in production code. I would also suggest that it may be overkill to worry too much about this scenario, since it's very unlikely to occur and complicates the implementation significantly.

dano
  • 91,354
  • 19
  • 222
  • 219
  • 1
    You're right: when such problems arise, I get strange results. In the current code, processes are launched by a special loop. My idea was to replace this loop by a (much simpler) call to map. However your solution is about as complex as the current one so I don't think it's worth. Also, processes write their results to the disk (so I don't really need their result). However you answer is interesting. – Mathieu Dubois Jun 25 '14 at 12:32
  • Also of note here: [`concurrent.futures.ProcessPoolExecutor`](https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor) does detect when a process has been killed unexpectedly, and will raise a `BrokenProcessPool` on any outstanding tasks when it happens. There is [a bug](http://bugs.python.org/issue22393) filed against `multiprocessing`, which has a working patch, to add this behavior to `multiprocessing.Pool` as well. – dano Mar 27 '15 at 16:12
  • 1
    Billiard is a multiprocessing fork that seems to handle this case https://pypi.org/project/billiard/ – JAR.JAR.beans Aug 08 '17 at 07:29
  • 1
    @JAR.JAR.beans excellent find! thank you so much...I've been having exactly this problem with dying processes and I've been looking for a way to capture more info about the dead processes: this module does the trick nicely. Meanwhile I have filed a Python issue since I think this needs resolving in the standard library code: https://bugs.python.org/issue43449 – jkp Mar 09 '21 at 20:27