12

Prior warning: I'm hacking around here out of curiosity. I have no specific reason to do what I'm doing below!

Below is done on Python 2.7.13 on MacOS 10.12.5

I was hacking around with python and I thought it'd be interesting to see what happened if I made stdout nonblocking

fcntl.fcntl(sys.stdout.fileno(), fcntl.F_SETFL, os.O_NONBLOCK)

The call to fcntl is definitely successful. I then try to write a large amount of data (bigger than the max buffer size of a pipe on OSX - which is 65536 bytes). I do it in a variety of ways and get different outcomes, sometimes an exception, sometimes what seems to be a hard fail.

Case 1

fcntl.fcntl(sys.stdout.fileno(), fcntl.F_SETFL, os.O_NONBLOCK)
try:
    sys.stdout.write("A" * 65537)
except Exception as e:
    time.sleep(1)
    print "Caught: {}".format(e)

# Safety sleep to prevent quick exit
time.sleep(1)

This always throws the exception Caught: [Errno 35] Resource temporarily unavailable. Makes sense I think. Higher level file object wrapper is telling me the write call failed.

Case 2

fcntl.fcntl(sys.stdout.fileno(), fcntl.F_SETFL, os.O_NONBLOCK)
try:
    sys.stdout.write("A" * 65537)
except Exception as e:
    print "Caught: {}".format(e)

# Safety sleep to prevent quick exit
time.sleep(1)

This sometimes throws the exception Caught: [Errno 35] Resource temporarily unavailable or sometimes there is no exception caught and I see the following output:

close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr

Case 3

fcntl.fcntl(sys.stdout.fileno(), fcntl.F_SETFL, os.O_NONBLOCK)
try:
    sys.stdout.write("A" * 65537)
except Exception as e:
    print "Caught: {}".format(e)

# Safety sleep to prevent quick exit
time.sleep(1)

print "Slept"

This sometimes throws the exception Caught: [Errno 35] Resource temporarily unavailable or sometimes there is no exception caught and I just see "Slept". It seems that by printing "Slept" I don't get the error message from Case 2.

Case 4

fcntl.fcntl(sys.stdout.fileno(), fcntl.F_SETFL, os.O_NONBLOCK)
try:
    os.write(sys.stdout.fileno(), "A" * 65537)
except Exception as e:
    print "Caught: {}".format(e)

# Safety sleep to prevent quick exit
time.sleep(1)

Always okay!

Case 5

fcntl.fcntl(sys.stdout.fileno(), fcntl.F_SETFL, os.O_NONBLOCK)
try:
    print os.write(sys.stdout.fileno(), "A" * 65537)
except Exception as e:
    print "Caught: {}".format(e)

# Safety sleep to prevent quick exit
time.sleep(1)

This is sometimes okay or sometimes prints the close failed in file object destructor error message.

My question is, why does this fail like this in python? Am I doing something fundamentally bad here - either with python or at the system level?

It seems like somehow that writing too soon to stdout when the write already failed causes the error message. The error doesn't appear to be an exception. No idea where it's coming from.

N.B. I can write the equivalent program in C and it works okay:

#include <stdio.h>
#include <stdlib.h>
#include <memory.h>
#include <sys/fcntl.h>
#include <unistd.h>

int main(int argc, const char * argv[])
{
    const size_t NUM_CHARS = 65537;
    char buf[NUM_CHARS];

    // Set stdout non-blocking
    fcntl(fileno(stdout), F_SETFL, O_NONBLOCK);

    // Try to write a large amount of data
    memset(buf, 65, NUM_CHARS);
    size_t written = fwrite(buf, 1, NUM_CHARS, stdout);

    // Wait briefly to give stdout a chance to be read from
    usleep(1000);

    // This will be written correctly
    sprintf(buf, "\nI wrote %zd bytes\n", written);
    fwrite(buf, 1, strlen(buf), stdout);
    return 0;
}
Andrew Parker
  • 1,425
  • 2
  • 20
  • 28
  • I *think* the problem is that with non-blocking IO, the call to `os.write` can return before the write is complete, and since this ends your simple test program, `sys.stdout` can also be closed before the write completes. Note that your C program waits (`usleep(1000)`) where your Python code does not. – chepner Jul 06 '17 at 12:10
  • Thanks for the response. I should have posted a more complete snippet of python code. I had a `time.sleep(1)` after the write call and it makes no difference :( – Andrew Parker Jul 06 '17 at 12:23
  • What's your OS? You can try running the Python under something like `strace` and see what system calls get made. – Andrew Henle Jul 06 '17 at 12:31
  • Let me update the post a bit as I can clarify a few things. There's slightly different behaviour from various calls. – Andrew Parker Jul 06 '17 at 12:35
  • @Andrew Parker using delay instead of fflush. Interesting idea actually. Did it come from some arduino experience ? :/ – 0___________ Jul 06 '17 at 12:35
  • Sorry, post changed quite a bit there. Hopefully much clearer in terms of what I see. I certainly have removed a few inconsistencies in my own mind. Not quite sure what the "expected" behaviour is yet. @PeterJ just a minor mind wander when writing a subprocess wrapper which will suck from a subprocess pipe in real time. I was deliberately filling a pipe in the subprocess and not reading from it and just wondered what would happen if I made stdout of the subprocess non-blocking. – Andrew Parker Jul 06 '17 at 12:51

1 Answers1

3

This is interesting. There are a few things that I've found so far:

Case 1

This is because sys.stdout.write will either write all of the string or throw an exception, which isn't the desired behavior when using O_NONBLOCK. When the underlying call to write returns EAGAIN (Errno 35 on OS X), it should be tried again with the data that is remaining. os.write should be used instead, and the return value should be checked to make sure all the data is written.

This code works as expected:

fcntl.fcntl(sys.stdout.fileno(), fcntl.F_SETFL, os.O_NONBLOCK)

def stdout_write(s):
    written = 0
    while written < len(s):
        try:
            written = written + os.write(sys.stdout.fileno(), s[written:])
        except OSError as e:
            pass

stdout_write("A" * 65537)

Case 2

I suspect that this error message is due to https://bugs.python.org/issue11380:

close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr

I'm not sure why it's sometimes being called. It may be because there is a print in the except statement which is trying to use the same stdout that a write just failed on.

Case 3

This is similar to Case 1. This code always works for me:

fcntl.fcntl(sys.stdout.fileno(), fcntl.F_SETFL, os.O_NONBLOCK)

def stdout_write(s):
    written = 0
    while written < len(s):
        try:
            written = written + os.write(sys.stdout.fileno(), s[written:])
        except OSError as e:
            pass

stdout_write("A" * 65537)

time.sleep(1)

print "Slept"

Case 4

Make sure you check the return value of os.write, I suspect that the full 65537 bytes are not being successfully written.

Case 5

This is similar to Case 2.

Tim
  • 4,790
  • 4
  • 33
  • 41
  • Thanks for the input @Tim. I hadn't got a chance to come back to this over the last week. I've taken a much longer look at the issue. In short, the exception from `sys.stdout.write` is totally expected (having looked at the source). `file.write` will raise an exception if either, fewer bytes were written than expected or `ferror` returns non-zero on the underlying FILE handle. So the mystery is of course around the error message. I've looked at the source and debugged my own build and I've yet to 100% identify the root cause (will continue next comment). – Andrew Parker Jul 16 '17 at 10:36
  • The error happens during interpretor shutdown, when the interpretor is closing the std file handles (which it does in the order, 1, 0, 2 i.e. stdout, stdin, stderr). It's the closing of stdout which fails (unsurprisingly). Specifically `fclose` fails. It occurs exactly when the `_flags` of the underlying `FILE` handle for `stdout` have `__SERR` set (which is a libc implementation detail for OSX). I thought it might be during the flush call when closing but seems not. Anyway, python is behaving "well" in that it's printing the error when it fails to close `stdout` properly. However, ... – Andrew Parker Jul 16 '17 at 10:39
  • I don't fully understand what puts the `FILE` handle in that state. I've tried to repro in C and no luck. I've also hacked around in the interpretor and none the wiser. I'd imagine I *can* find the issue, but I've already gone deep down the rabbit hole right now so need to look away for a while :) I might come back to this out of curiosity. It's vaguely possible there's a weird edge case bug with python here, but even if there is it's so unlikely anyone would do this that! – Andrew Parker Jul 16 '17 at 10:42