3

In python 2.7 on Ubuntu 14.04, I launch a process like this:

bag_process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
for i in range(5):
    print "Countdown: {}".format(5 - i - 1)
    time.sleep(1)
print "Sending SIGINT to PID {}".format(bag_process.pid)
bag_process.send_signal(signal.SIGINT)
(bag_out, bag_err) = bag_process.communicate()

The program hangs on the communicate() line. When I open another terminal, I run ps -ef | grep ### to find the pid of the subprocess, and I see it's <defunct>.

Why is the child program becoming defunct, and the parent program hanging on communicate()? Provided that the child truly exits after receiving SIGINT, how can I make the parent program reliably handle that without hanging?

martineau
  • 119,623
  • 25
  • 170
  • 301
Edward Ned Harvey
  • 6,525
  • 5
  • 36
  • 45
  • 1
    It is defunct because it has exited but the parent process hasn't read out the exit code yet, see https://en.wikipedia.org/wiki/Zombie_process. Perhaps the SIGINT did something with the pipe that the childprocess was attached to, like signalling to re-open file handles? Without the process itself this is not something we can be more specific about. – Martijn Pieters Jun 01 '18 at 15:16
  • 2
    Any chance your dead program started still-alive children (which never closed the stdout and stderr they inherited), or passed its stdout or stderr FDs over a socket to a different program that still has them open? Either of those scenarios would provide a solid explanation for `communicate()` not being able to complete and return. – Charles Duffy Jun 01 '18 at 15:52
  • @MartijnPieters Thanks, but I believe the `communicate()` method uses `wait()` to get the exit status, even though `communicate()` itself does not return the exit status. https://github.com/python/cpython/blob/2.7/Lib/subprocess.py#L452 – Edward Ned Harvey Jun 01 '18 at 22:52
  • @CharlesDuffy Good thought, but no dice... I grepped for the defunct pid, and it wasn't listed as a PPID for any other PID's. – Edward Ned Harvey Jun 01 '18 at 22:53
  • @EdwardNedHarvey: yes, but it never gets there because the pipe is blocked somewhere. – Martijn Pieters Jun 01 '18 at 23:17
  • @MartijnPieters Yup. If I remove `stdout=subprocess.PIPE, stderr=subprocess.PIPE` then everything works fine. But of course, I can't get the output of the child process. In the current case, I can live with that, but it seems like a python bug handling the closing & reading of subprocess file handles. – Edward Ned Harvey Jun 02 '18 at 19:48
  • @EdwardNedHarvey: no, this is something the child process does to the pipe on a SIGHUP. It may mean it is incompatible with your use-cases. – Martijn Pieters Jun 02 '18 at 20:58
  • @EdwardNedHarvey, ...so, a good place to start is using `lsof` to look for the process having a still-open handle on the other end of one of the pipes... or, if you're set up with better tooling, `sysdig`. – Charles Duffy Jun 03 '18 at 02:47
  • @EdwardNedHarvey, ...btw, if you're on a glibc-based platform, does running `['stdbuf', '-oL'] + cmd, ...` make any difference (telling your child process to generate line-buffered output on stdout, if it sticks with the libc defaults)? – Charles Duffy Jun 04 '18 at 16:56
  • I figured it out. Charles Duffy was right. The child process starts grand-child processes, but when the child dies, the PPID of the grandchildren get reset to 1. So when I grepped for the defunct pid, I found none. I needed to grep for the child pid *before* the child dies. – Edward Ned Harvey Jun 05 '18 at 00:33

1 Answers1

0

The problem was: Don't kill a process like this:

bag_process.send_signal(signal.SIGINT)

Instead, kill the process and all of its sub-processes like this:

parent = psutil.Process(bag_process.pid)
for child in parent.get_children(recursive=True):
    child.send_signal(signal.SIGINT)
bag_process.send_signal(signal.SIGINT)
Edward Ned Harvey
  • 6,525
  • 5
  • 36
  • 45