2

I want to read stdout and stderr from a subprocess in the same thread as described in this post. While running the code inside Python2.7 works as expected, the select() call in Python3.3 seems to do not what it should.

Have a look - here is a script that would print two lines on both stdout and stderr, then wait and then repeat this a couple of times:

import time, sys
for i in range(5):
    sys.stdout.write("std: %d\n" % i)
    sys.stdout.write("std: %d\n" % i)
    sys.stderr.write("err: %d\n" % i)
    sys.stderr.write("err: %d\n" % i)
    time.sleep(2)

The problematic script will start the script above in a subprocess and read it's stdout and stderr as described in the posted link:

import subprocess
import select

p = subprocess.Popen(['/usr/bin/env', 'python', '-u', 'test-output.py'],
                     stdout=subprocess.PIPE, stderr=subprocess.PIPE)

r = [p.stdout.fileno(), p.stderr.fileno()]

while p.poll() is None:
    print("select")
    ret = select.select(r, [], [])

    for fd in ret[0]:
        if fd == p.stdout.fileno():
            print("readline std")
            print("stdout: " + p.stdout.readline().decode().strip())
        if fd == p.stderr.fileno():
            print("readline err")
            print("stderr: " + p.stderr.readline().decode().strip())

Note that I start the Python subprocess using the -u option which causes Python to not buffer stdout and stderr. Also I print some text before calling select() and readline() to see, where the script blocks.

And here is the problem: running the script in Python3, after each cycle the output blocks for 2 seconds despite the fact, that two more lines are waiting to be read. And as indicated by the text before each call of select() you can see that it's select() which is blocking (not readline()).

My first thought was that select() only resumes on a flush on Python3 while Python2 it returns always when there's data available but in this case only one line would be read each 2 seconds (which is not the case!)

So my question is: is this a bug in Python3-select()? Did I misunderstand the behavior of select()? And is there a way to workaround this behavior without having to start a thread for each pipe?

Output when running Python3:

select
readline std
stdout: std: 0
readline err
stderr: err: 0
select            <--- here the script blocks for 2 seconds
readline std
stdout: std: 0
select
readline std
stdout: std: 1
readline err
stderr: err: 0
select            <--- here the script should block (but doesn't)
readline err
stderr: err: 1
select            <--- here the script blocks for 2 seconds
readline std
stdout: std: 1
readline err
stderr: err: 1
select            <--- here the script should block (but doesn't)
readline std
stdout: std: 2
readline err
stderr: err: 2
select
.
.

Edit: Please note that it has no influence whether the child process is a Python script. The following C++ program has the same effect:

int main() {
    for (int i = 0; i < 4; ++i) {
        std::cout << "out: " << i << std::endl;
        std::cout << "out: " << i << std::endl;
        std::cerr << "err: " << i << std::endl;
        std::cerr << "err: " << i << std::endl;
        fflush(stdout);
        fflush(stderr);
        usleep(2000000);
}}
Community
  • 1
  • 1
frans
  • 8,868
  • 11
  • 58
  • 132
  • 1
    have you tried obvious things: call `.flush()` explicitly (`-u` behaves differently in Python 3 -- the streams are line-buffered); set `bufsize=0`; use `os.read(fd, 512)` instead of `.readline()`? The code itself is broken e.g.: there could be data buffered even if `p.poll() is not None` – jfs Jan 16 '15 at 19:17
  • Another change is that sys.stdout/err became io.TextIOWrapper objects in 3.x, with consistent behavior (and defaults) across platforms. Before, for CPython, they were file objects that thinly wrapped the stdio structures of the particular C compiler used. I suggest carefully reading the io module docs. – Terry Jan Reedy Jan 16 '15 at 20:48
  • @J.F.Sebastian: the child process doesn't have to be a Python program. A C++ program which does the same (and flushes explicitly) will show the same effect. Also the `.poll()` has no influence - you can replace the line with `while True`. Currently I'm pretty convinced that `select()` is broken in Python3. – frans Jan 19 '15 at 07:47
  • 1
    @frans: details matter. What happens if you use `os.read(fd, 512)` as I've suggested? – jfs Jan 19 '15 at 08:10
  • @J.F.Sebastian: you were right with the `bufsize=0` hint (and with `read()`, too) I didn't check it again because I did that already, but I think I had another issue at that time. Unfortunately I can't mark this as answer but your comments are worth the upvote. – frans Jan 19 '15 at 08:41

1 Answers1

1

It seems that the reason is buffering in subprocess.PIPE and first readline() call reads all available data (i.e., two lines) and returns first one.

After that, there is no unread data in pipe, so select() is not returning immediately. You can check this by doubling readline calls:

print("stdout: " + p.stdout.readline().decode().strip())
print("stdout: " + p.stdout.readline().decode().strip())

and ensuring that second readline() call doesn't block.

One solution is to disable buffering using bufsize=0:

p = subprocess.Popen(['/usr/bin/env', 'python', '-u', 'test-output.py'],
                 stdout=subprocess.PIPE, stderr=subprocess.PIPE, bufsize=0)

Another possible solution is to do a non-blocking readline() or to ask pipe file object its read buffer size, but I don't know is it possible.

You can also read directly from p.stdout.fileno() to implement non-blocking readline().

Update: Python2 vs. Python3

The reason why Python3 differs from Python2 here is likely to be in new I/O module (PEP 31136). See this note:

The BufferedIOBase methods signatures are mostly identical to that of RawIOBase (exceptions: write() returns None , read() 's argument is optional), but may have different semantics. In particular, BufferedIOBase implementations may read more data than requested or delay writing data using buffers.

gavv
  • 4,649
  • 1
  • 23
  • 40
  • It looks like you're right, setting `bufsize` to 0 (as J.F. Sebastian also suggested) does the trick . But I still don't see why the Python 2 behavior is different. Isn't the Python2 `read()` buffered, too? – frans Jan 19 '15 at 08:16
  • @frans, it seems that in Python2, readline() reads only one line and returns it, while in Python3 it reads all available data and returns first line. I have updated the answer. – gavv Jan 19 '15 at 08:42