0

The logic of my multiprocessing program that tries to handle exceptions in processes is pretty much like the following:

import multiprocessing

class CriticalError(Exception):

    def __init__(self, error_message):
        print error_message
        q.put("exit")


def foo_process():
    while True:
        try:
            line = open("a_file_that_does_not_exist").readline()
        except IOError:
            raise CriticalError("IOError")

        try:
            text = line.split(',')[1]
            print text
        except IndexError:
            print 'no text'

if __name__ == "__main__":
    q = multiprocessing.Queue()
    p = multiprocessing.Process(target=foo_process)
    p.start()

    while True:
        if not q.empty():
            msg = q.get()
            if msg == "exit":
                p.terminate()
                exit()

If I don't have the try-except around file operation, I get

Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "foo.py", line 22, in foo_process
    line = open("a_file_that_does_not_exist").readline()
IOError: [Errno 2] No such file or directory: 'a_file_that_does_not_exist'

but the program remains open. Is there a Pythonic way to remove the try-except clause related to IOError, or actually, to have all unhandled exceptions either put the "exit" message into Queue 'q', or terminate the process and exit the program some other way? This would clear my codebase by a huge amount when I wouldn't have to catch errors that in applications without multiprocessing kill the program automatically. It would also allow me to add assertions when AssertionError would also exit the program. Whatever the solution, I'd like to be able to see the traceback -- my current solution doesn't provide it.

maqp
  • 85
  • 1
  • 10

1 Answers1

0

Since the child will die on exception anyway (i.e. p.terminate() is pointless) then why not let the master process check if its child is still alive?

from queue import Empty
# from Queue import Empty  # if Python 2.x

while not q.empty():
    if not p.is_alive():
        break

    try:
        msg = q.get(timeout=1)
    except Empty:
        continue

    # other message handling code goes here

# some graceful cleanup
exit()

Note that I've added timeout on get so it won't block forever when the child is dead. You can customize the period to your needs.

With that you don't need to do anything unusual in the child process like pushing to a queue on error. Besides the original approach will fail on some rare occasions, e.g. force kill on the child will cause the master to hang forever (cause child won't have time to push anything to the queue).

You can potentially retrieve traceback from the child process by rebinding sys.stdout (and/or sys.stderr) inside foo_process function (to either parent's stdout or a file or whatever file descriptors support). Have a look here:

Log output of multiprocessing.Process


Without queue and with multiple processes I would go for something like that:

processes = [f, b, c]
while processes:
    time.sleep(1)
    for p in processes:
        if not p.is_alive():
            processes.remove(p)
            break
exit()

which can be done better with joins:

processes = [f, b, c]
for p in processes:
    p.join()
exit()

assuming that master is not supposed to do anything else while waiting for children.

Community
  • 1
  • 1
freakish
  • 54,167
  • 9
  • 132
  • 169
  • Checking if the child is alive looks like a great idea. The queue won't be delivering anything else other than the message so it won't be needed (I'm unsure if there's any other reason your solution is checking the queue). I should have mentioned there are more than one process. But I did queried the `is_alive` for each process in a loop, and if any of them are dead, killed the other processes before exiting. It prints the traceback and exits cleanly: http://pastebin.com/K1C9TbNY – maqp Aug 10 '16 at 15:12
  • @maqp It's fine, you can keep a list of processes and check if they are alive in a `for` loop. Also removing the queue is a big gain. But in that case you would probably need some `time.sleep()` to avoid [busy waiting](https://en.wikipedia.org/wiki/Busy_waiting). – freakish Aug 10 '16 at 15:14
  • (Yes, I'm using delays but left them out to keep example code minimal). Many thanks! – maqp Aug 10 '16 at 15:21
  • @maqp What I've meant is that `time.sleep(1)` should be in master loop (otherwise it will spin forever taking 100% of one cpu core). Also I've modified a bit your code (I think you should wait for other processes, don't just selfishly exit when one of the child misbehaves ;) that obviously depends on use case ). Have a look at my updated answer. – freakish Aug 10 '16 at 15:25
  • Yes, the master loop also has delay although it wasn't in the updated version. I'm giving it my best effort to ensure the software can gracefully exit at any point without data loss (e.g. some functions are run as threads). The 'selfish' behavior of killing all childs if an unhandled exception occurs in any child process, is intentional. – maqp Aug 10 '16 at 15:48