3

I'm using python to benchmark something. This can take a large amount of time, and I want to set a (global) timeout. I use the following script (summarized):

class TimeoutException(Exception):
    pass
def timeout_handler(signum, frame):
    raise TimeoutException()

# Halt problem after half an hour
signal.alarm(1800)
try:
    while solution is None:
        guess = guess()
        try:
            with open(solutionfname, 'wb') as solutionf:
                solverprocess = subprocess.Popen(["solver", problemfname], stdout=solutionf)
                solverprocess.wait()
        finally:
            # `solverprocess.poll() == None` instead of try didn't work either
            try:
                solverprocess.kill()
            except:
                # Solver process was already dead
                pass
except TimeoutException:
    pass
# Cancel alarm if it's still active
signal.alarm(0)

However it keeps spawning orphan processes sometimes, but I can't reliably recreate the circumstances. Does anyone know what the correct way to prevent this is?

dtech
  • 13,741
  • 11
  • 48
  • 73

3 Answers3

2

You simply have to wait after killing the process.

Daniel
  • 42,087
  • 4
  • 55
  • 81
  • So if `p.kill()` is called but `p` hasn't exited before python exits `p` isn't killed at all? I'll test if that's true, but why would that be? From what I understand `p.kill()` sends `SIGKILL` to `p` which should cause `p` to die even if python exits. – dtech Oct 11 '14 at 12:27
  • 1
    @dtech The process *is* killed but *the kernel* does *not* remove it because it is waiting the parent process to read its status. – Bakuriu Oct 11 '14 at 12:28
  • @dtech: the process is dead ([SIGKILL always works](http://unix.stackexchange.com/q/5642/1321)) but the zombie remains until it is reaped. If the original parent process is dead then a dedicated process such as `init 1` will collect the status. Note: [an orphan process must be alive by definition](https://en.wikipedia.org/wiki/Orphan_process) (its parent is dead). Your code create zombies, not orphans. – jfs Jun 12 '15 at 19:43
2

The documentation for the kill() method states:

Kills the child. On Posix OSs the function sends SIGKILL to the child. On Windows kill() is an alias for terminate().

In other words, if you aren't on Windows, you are only sending a signal to the subprocess. This will create a zombie process because the parent process didn't read the return value of the subprocess.

The kill() and terminate() methods are just shortcuts to send_signal(SIGKILL) and send_signal(SIGTERM).

Try adding a call to wait() after the kill(). This is even shown in the example under the documentation for communicate():

proc = subprocess.Popen(...)
try:
    outs, errs = proc.communicate(timeout=15)
except TimeoutExpired:
    proc.kill()
    outs, errs = proc.communicate()

note the call to communicate() after the kill(). (It is equivalent to calling wait() and also erading the outputs of the subprocess).


I want to clarify one thing: it seems like you don't understand exactly what a zombie process is. A zombie process is a terminated process. The kernel keeps the process in the process table until the parent process reads its exit status. I believe all memory used by the subprocess is actually reused; the kernel only has to keep track of the exit status of such a process.

So, the zombie processes you see aren't running. They are already completely dead, and that's why they are called zombie. They are "alive" in the process table, but aren't really running at all.

Calling wait() does exactly this: wait till the subprocess ends and read the exit status. This allows the kernel to remove the subprocess from the process table.

Bakuriu
  • 98,325
  • 22
  • 197
  • 231
  • You're right, I meant 'orphan process'. They're definitely not dead because they keep consuming CPU and RAM. I'll try your and Daniel's suggestion. – dtech Oct 11 '14 at 12:34
  • @dtech That's a completely different scenario. An orphan process happens when its parent process dies. Then it becomes a child of `init`. In your example I don't really see where these may come from. More information about the subprocess you are launching is required. Also, are you sure that you want to `kill` that subprocess? *That* may cause creation of orphan processes because `kill`ing doesn't allow the subprocess to perform any cleanup. You should first try to call `terminate` and afterwards, if that fails, call `kill`. – Bakuriu Oct 11 '14 at 12:44
  • You're right that SIGTERM is probably better, but I don't know how SIGKILL could cause the process to not be killed and be orphaned. The subprocess is a [SAT-solver](http://fmv.jku.at/lingeling/). I'll try it with terminate and calling kill after a while. – dtech Oct 11 '14 at 12:55
  • @dtech The sat solver may spawn some subprocess to perform the actual computations. Your `SIGKILL` kills *only* the main process. Using `SIGTERM` *may* let the main process terminate its subprocesses before exiting, it depends on how well it was implemented. – Bakuriu Oct 11 '14 at 13:01
1

On linux, you can use python-prctl.

Define a preexec function such as:

def pre_exec():
    import signal
    prctl.set_pdeathsig(signal.SIGTERM)

And have your Popen call pass it.

subprocess.Popen(..., preexec_fn=pre_exec)

That's as simple as that. Now the child process will die rather than become orphan if the parent dies.

If you don't like the external dependency of python-prctl you can also use the older prctl. Instead of

prctl.set_pdeathsig(signal.SIGTERM)

you would have

prctl.prctl(prctl.PDEATHSIG, signal.SIGTERM)
Finch_Powers
  • 2,938
  • 1
  • 24
  • 34