What's a good way to detect when a Python program exits or crashes?

Question

I have the following Python program running in a Docker container.

Basically, if the Python process exits gracefully (ex. when I manually stop the container) or if the Python process crashes (while inside some_other_module.do_work()) then I need to do some cleanup and ping my DB telling it that process has exited.

What's the best way to accomplish this? I saw one answer where they did a try catch on main(), but that seems a bit odd.

My code:

def main():
    some_other_module.do_work()

if __name__ == '__main__':
    main()

How much detection do you want to do? Detecting an exception is easy. Detecting an OS error can be trickier. Detecting a kernel panic or critical failure is difficult. Detecting someone unplugging the machine physically is nigh impossible without special hardware. — Silvio Mayolo, Feb 19 '19 at 18:00
Interested in your use case @farza - are you expecting a crash, of known cause? — jtlz2, Feb 19 '19 at 18:08
A process shouldn't crash in any circumstances. If it does, the code isn't properly written, and trying to solve the problem at this level is just "covering up the dirt". Do you have a concrete example? Otherwise this seems to be an *XY problem*. — CristiFati, Feb 19 '19 at 18:10
In our case, `some_other_module.do_work()` may error out after it has run for a certain period of time. It's basically reading a video stream, and when that stream ends, the Python process exits easily (Case1). But, there are cases where exceptions can occur while reading this stream and we can't gracefully exit (Case 2). The third case is where we just stop the Docker container and Python process exits gracefully (Case 3) — farza, Feb 19 '19 at 18:19
What do you mean by "*we can't gracefully exit (Case 2)*"? If there's some exception while handling the stream, and the program exits it should return a *non 0* exit code. If this is the case checking its exit code (if no longer running) would do. — CristiFati, Feb 19 '19 at 20:19

score 5 · Accepted Answer · answered Feb 19 '19 at 18:00

5

I assume that the additional cleanup will be done by a different process, since the main process has likely crashed in a not recoverable way (I understood the question in this way).

The simplest way would be that the main process sets a flag somewhere (maybe creates a file in a specified location, or a column value in a database table; could also include the PID of the main process that sets the flag) when it starts and removes (or un-sets) that same flag if it finishes gracefully.

The cleanup process just needs to check the flag:

if the flag is set but the main process has ended already (the flag could contain the PID of the main process, so the cleanup process uses that to find if the main process is still running or not), then a cleanup is in order.
if the flag is set and the main process is running, then nothing is to be done.
if the flag is not set, then nothing is to be done.

answered Feb 19 '19 at 18:00

Ralf

16,086
4
44
68

Interesting, so, a process to monitor the process. That could certainly work and seems like a good solution. So basically - 1) start main process 2) main process writes to file with pid 3) main process spawns monitor process 4) monitor process polls to check if main process has crashed based on PID in file. Am I getting that right? – farza Feb 19 '19 at 18:26
Yes, that is one way. Or, the monitor process could be launched by someone else: maybe a cronjob launches the monitor every 5 minutes or something, depending on your use case. – Ralf Feb 19 '19 at 18:28
So, I thought a bit more about this and the solution actually does not work for the case where I stop the Docker container holding both processes. The container would hold both the monitor and main process. But, when you stop the container both will be killed! – farza Feb 19 '19 at 18:58
You could put that monitor script onto another container and let the flag be in a third container, but this sheme will also fail if, for example, a meteorite strikes the datacenter that holds all of your docker containers. – Ralf Feb 19 '19 at 19:15
2

My point is, there is no perfect solution, but you need to decide what level of resiliance you need. – Ralf Feb 19 '19 at 19:15
1

I ended up building a solution like this which simply keeps a list of `aliveIds` and constantly polls each container to check if that process is still running. If it isn't, then the program exited and we can do cleanup for that process. Thanks for your help! – farza Feb 20 '19 at 18:53

PrinceOfCreation · Answer 2 · 2019-02-19T18:04:47.120

1

Try-catch on main seems simplest, but doesn't/may not work for most things (please see comments below). You can always except specific exceptions:

def main():
    some_other_module.do_work()

if __name__ == '__main__':
    try:
        main()
    except Exception as e:
        if e == "<INSERT GRACEFUL INTERRUPT HERE>":
            # finished gracefully
        else:
            print(e)
            # crash

edited Feb 19 '19 at 18:04

answered Feb 19 '19 at 17:59

PrinceOfCreation

389
1
12

This won't work (in most of the cases). E.g. try placing code that segfaults in a *try* / *except*. It will be useless. – CristiFati Feb 19 '19 at 18:03

Benoît P · Answer 3 · 2019-02-19T18:11:12.950

1

Use a try/except

def thing_that_crashes():
    exit()

try:
    thing_that_crashes()
except:
    print('oh and by the way, that thing tried to kill me')

I think it is impossible to catch a process with advanced suicidal behaviour (I don't know sending a SYGKILL to itself or something) so if you need your main process to live whatever happens, maybe run the other one in a subprocess.

edited Feb 19 '19 at 18:11

answered Feb 19 '19 at 18:02

Benoît P

3,179
13
31

score 1 · Answer 4 · answered Feb 19 '19 at 18:50

1

You could wrap your script with another subprocess script and check the returncode. Inspired by this Relevant question.

from subprocess import Popen

script = Popen("python abspath/to/your/script.py")
script.communicate()
if script.returncode <> 0:
    # something went wrong
    # do something about it

answered Feb 19 '19 at 18:50

r.ook

13,466
2
22
39

very nice answer!! :) – jtlz2 Feb 20 '19 at 07:25

What's a good way to detect when a Python program exits or crashes?

4 Answers4