Using subprocess with select and pty hangs when capturing output

Question

I'm trying to write a python program that is able to interact with other programs. That means sending stdin and receiving stdout data. I cannot use pexpect (although it definitely inspired some of the design). The process I'm using right now is this:

Attach a pty to the subprocess's stdout
Loop until the subprocess exits by checking subprocess.poll
- When there is data available in the stdout write that data immediately to the current stdout.
Finish!

I've been prototyping some code (below) which works but seems to have one flaw that is bugging me. After the child process has completed, the parent process hangs if I do not specify a timeout when using select.select. I would really prefer not to set a timeout. It just seems a bit dirty. However, all the other ways I've tried to get around the issue don't seem to work. Pexpect seems to get around it by using os.execv and pty.fork instead of subprocess.Popen and pty.openpty a solution I do not prefer. Am I doing something wrong with how I check for the life of the subprocess? Is my approach incorrect?

The code I'm using is below. I'm using this on a Mac OS X 10.6.8, but I need it to work on Ubuntu 12.04 as well.

This is the subprocess runner runner.py:

import subprocess
import select
import pty
import os
import sys

def main():
    master, slave = pty.openpty()

    process = subprocess.Popen(['python', 'outputter.py'], 
            stdin=subprocess.PIPE, 
            stdout=slave, stderr=slave, close_fds=True)

    while process.poll() is None:
        # Just FYI timeout is the last argument to select.select
        rlist, wlist, xlist = select.select([master], [], [])
        for f in rlist:
            output = os.read(f, 1000) # This is used because it doesn't block
            sys.stdout.write(output)
            sys.stdout.flush()
    print "**ALL COMPLETED**"

if __name__ == '__main__':
    main()

This is the subprocess code outputter.py. The strange random parts are just to simulate a program outputting data at random intervals. You can remove it if you wish. It shouldn't matter:

import time
import sys
import random

def main():
    lines = ['hello', 'there', 'what', 'are', 'you', 'doing']
    for line in lines:
        sys.stdout.write(line + random.choice(['', '\n']))
        sys.stdout.flush()
        time.sleep(random.choice([1,2,3,4,5])/20.0)
    sys.stdout.write("\ndone\n")
    sys.stdout.flush()

if __name__ == '__main__':
    main()

Thanks for any help you all can provide!

Extra note

pty is used because I want to ensure that stdout isn't buffered.

score 12 · Accepted Answer · edited Oct 23 '13 at 19:35

First of all, os.read does block, contrary to what you state. However, it does not block after select. Also os.read on a closed file descriptor always returns an empty string, that you might want to check for.

The real problem however is that the master device descriptor is never closed, thus the final select is the one that will block. In a rare race condition, the child process has exited between select and process.poll() and your program exits nicely. Most of the time however the select blocks forever.

If you install the signal handler as proposed by izhak all hell breaks loose; whenever a child process is terminated, the signal handler is run. After the signal handler is run, the original system call in that thread cannot be continued, so that syscall invocation returns nonzero errno, which often results in some random exception being thrown in python. Now, if elsewhere in your program you use some library with any blocking system calls that do not know how to handle such exceptions, you are in a big trouble (any os.read for example anywhere can now throw an exception, even after a successful select).

Weighing having random exceptions thrown anywhere against polling a bit, I don't think the timeout on select does not sound that bad idea. Your process would still hardly be the only (slow) polling process on the system anyway.

Thanks for the fantastic explanation. I figured, after a while, that it would probably just be best to set a timeout. I tried izhak's solution but yes, I saw some very strange behavior after doing so. This helps alot! — ravenac95, Sep 02 '12 at 04:38
For my own betterment, can you explain why my answer fell short? It should let you avoid using any timeouts. — the paul, Sep 04 '12 at 03:20
I've implemented your suggestions in [the answer to a related question](http://stackoverflow.com/a/12471855/4279) — jfs, Sep 18 '12 at 07:45

score 9 · Answer 2 · answered Aug 31 '12 at 00:32

9

There are a number of things you can change to make your code correct. The simplest thing I can think of is just to close your parent process's copy of the slave fd after forking, so that when the child exits and closes its own slave fd, the parent's select.select() will mark the master as available for read, and the subsequent os.read() will give an empty result and your program will complete. (The pty master won't see the slave end as being closed until both copies of the slave fd are closed.)

So, just one line:

os.close(slave)

..placed immediately after the subprocess.Popen call, ought to fix your problem.

However, there are possibly better answers, depending on exactly what your requirements are. As someone else noted, you don't need a pty just to avoid buffering. You could use a bare os.pipe() in place of pty.openpty() (and treat the return value exactly the same). A bare OS pipe will never buffer; if the child process isn't buffering its output, then your select() and os.read() calls won't see buffering either. You would still need the os.close(slave) line, though.

But it's possible that you do need a pty for different reasons. If some of your child programs expect to be run interactively much of the time, then they might be checking to see if their stdin is a pty and behaving differently depending on the answer (lots of common utilities do this). If you really do want the child to think it has a terminal allocated for it, then the pty module is the way to go. Depending on how you'll run runner.py, you may need to switch from using subprocess to pty.fork(), so that the child has its session ID set and the pty pre-opened (or see the source for pty.py to see what it does and duplicate the appropriate parts in your subprocess object's preexec_fn callable).

answered Aug 31 '12 at 00:32

the paul

8,972
1
36
53

Indeed, the slave descriptor was not closed, and my bad for not noticing it. However, this line is not yet enough by itself, since os.read reacts to the killing of the child process with errno = EIO, thus all reads must be guarded with try-except checking for errno = EIO and the reason behind it. – Antti Haapala -- Слава Україні Sep 04 '12 at 09:40
Hmm, there shouldn't be any reason to get EIO when reading from a pipe. On the read side, you should just get a short read under POSIX semantics (so in this case, the empty string- the python EOF). – the paul Sep 04 '12 at 23:39
How interesting! I can't reproduce on linux 3.2 with a bare ubuntu-precise-12.04-amd64-server-20120616 image on ec2, after 200 runs. EIO is only supposed to be for hardware or unexpected FS errors. – the paul Sep 06 '12 at 22:45
Strange. "Linux ubuntu 3.2.0-26-generic #41-Ubuntu SMP Thu Jun 14 17:49:24 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux", failed on 5th run, "Linux 3.1.10-grbfs-custom #2 SMP Sun Jan 22 18:37:08 EET 2012 x86_64 GNU/Linux" failed on very first run. Are you sure you did not run the output.py by accident (happened to me just a moment ago :). Still, when running the parent, OSError: [Errno 5] Input/output error at output = os.read(f, 1000) – Antti Haapala -- Слава Україні Sep 07 '12 at 05:12
Also first attempt on 64bit ec2 custom precise image. – Antti Haapala -- Слава Україні Sep 07 '12 at 05:15
Quite sure- every run ended with the '`**ALL COMPLETED**`' message. That's very interesting- I hope you don't mind if we try to determine what the differentiating factor is. Did you try both of those kernels on the same physical machine? And your only change to the source was adding the `os.close(slave)` and the `s/pty\.openpty/os.pipe/` ? – the paul Sep 07 '12 at 18:45

score 0 · Answer 3 · edited Jun 20 '20 at 09:12

0

From what I understand, you do not need to use pty. runner.py can be modified as

import subprocess
import sys

def main():
        process = subprocess.Popen(['python', 'outputter.py'],
                        stdin=subprocess.PIPE,
                        stdout=subprocess.PIPE, stderr=subprocess.PIPE)

        while process.poll() is None:
                output = process.stdout.readline()
                sys.stdout.write(output)
                sys.stdout.flush()
        print "**ALL COMPLETED**"

if __name__ == '__main__':
        main()

process.stdout.read(1) can be used instead of process.stdout.readline() for real-time output per character from the subprocess.

Note: If you do not require real-time output from the subprocess, use Popen.communicate to avoid the polling loop.

edited Jun 20 '20 at 09:12

Community

1
1

answered Jun 23 '12 at 01:19

panickal

1,154
9
13

1

panickal: Thanks for the response but I actually want to ensure that any output is not buffered, hence the need for pty. I'll edit the question to make it clear that it's a requirement. – ravenac95 Jun 23 '12 at 04:17
If the programs `runner.py` is interacting with are python ones, you can add `python -u` to Popen command for enabling unbuffered output. I tested with `outputter.py` and it worked. – panickal Jun 24 '12 at 00:31
1

unfortunately, they won't always be python applications :-/ – ravenac95 Jun 25 '12 at 07:25

score 0 · Answer 4 · answered Aug 29 '12 at 20:01

When your child process exits - your parent process gets SIGCHLD signal. By default this signal is ignored but you can intercept it:

import sys
import signal

def handler(signum, frame):
    print 'Child has exited!'
    sys.exit(0)

signal.signal(signal.SIGCHLD, handler)

The signal should also break the blocking syscall to 'select' or 'read' (or whatever you are in) and let you do whatever you have to (cleanup, exit, etc.) in handler function.

Using subprocess with select and pty hangs when capturing output

4 Answers4

Linked