2

I have a simple python script that dies after a random amount of output.

IMHO - it should function

This is linux specific, python2 or python3 same problem; not sure about python-mac - I don't have a mac handy; And is not a python-windows issue.

The problem I believe is in Linuxes interpretation of stdio blocking or non blocking. In my view, and to a large extent I believe Pythons view STDIN and STDOUT are two different files.

This post talks about the problem in terms of C code: When non-blocking I/O is turned on for stdout, is it correct for the OS to turn it on for stdin too?

The problem : On STDIN, set the os.O_NONBLOCK option on STDIN.

Expected result:

O_NONBLOCK should apply only to the file it applied to.

Actual Result:

O_NONBLOCK is set on both STDIN and STDOUT

As a result: As your application writes to stdout (or stderr) in the standard way (for example outputing log or debug information) at some point python2, or python3 exits with an IO error at some random later time.

The code below demonstrates the problem

import sys
import fcntl
import os

def show_flags(why):
    print("%s: FLAGS: 0x%08x" % (why,fcntl.fcntl( sys.stdout.fileno(), fcntl.F_GETFL )))

show_flags("startup")
f = fcntl.fcntl( sys.stdin.fileno(), fcntl.F_GETFL )
show_flags("got-stdin")
f = f | os.O_NONBLOCK
fcntl.fcntl( sys.stdin.fileno(), fcntl.F_SETFL, f )
show_flags("set-stdin")

# produce spewing output to show the error.
for x in range(0,10000):
    sys.stdout.write("x=%10d -------- lots of text here to fill buffer\n" % x)

If I run: "strace -o log python test.py" - and capture the log output The relevent portion showing the error is:

(start)

fcntl(1, F_GETFL)                       = 0x8002 (flags O_RDWR|O_LARGEFILE)
write(1, "startup: FLAGS: 0x00008002\n", 27) = 27
fcntl(0, F_GETFL)                       = 0x8002 (flags O_RDWR|O_LARGEFILE)
fcntl(1, F_GETFL)                       = 0x8002 (flags O_RDWR|O_LARGEFILE)
write(1, "got-stdin: FLAGS: 0x00008002\n", 29) = 29
fcntl(0, F_SETFL, O_RDWR|O_NONBLOCK|O_LARGEFILE) = 0
fcntl(1, F_GETFL)                       = 0x8802 (flags O_RDWR|O_NONBLOCK|O_LARGEFILE)
write(1, "set-stdin: FLAGS: 0x00008802\n", 29) = 29
write(1, "x=         0 -------- lots of te"..., 55) = 55
write(1, "x=         1 -------- lots of te"..., 55) = 55

After some random number of writes (300 to 500) linux returns with an error

write(1, "x=       495 -------- lots of te"..., 55) = 55
write(1, "x=       496 -------- lots of te"..., 55) = -1 EAGAIN (Resource temporarily unavailable)
write(2, "Traceback (most recent call last"..., 35) = -1 EAGAIN (Resource temporarily unavailable)

Suggestions?

I can't easily go inside the middle of a rather large application that likes to spew debug log output...

Problem: The application wants/needs to monitor/read STDIN in a non-blocking way so that it can process output from the parent and act upon it. Writing a log should not cause the app to die at random places.

Wrapping every log statement with "try/catch" blocks is insanity.

Asking the linux gods to please change this behavior is not going to happen.

Python3 - appears to retry at least one time, which sometimes succeeds but mostly does not - Python2 does not retry at all.

Fixing this INSIDE python via a custom hack is also insanity... I cannot easily distribute my custom version of Python.EXE to every body who needs to run my application.

Suggestions?

user3696153
  • 568
  • 5
  • 15
  • You will likely have much better success with the blocking I/O APIs in Python, as those are much more commonly used. You can start a separate thread to listen on stdin and asynchronously report events back to your main thread without needing to resort to non-blocking I/O and `select`. If that's not high enough performance for you and you need true non-blocking I/O then Python is probably the wrong language anyway and you'll need to at least write the performance-sensitive parts of your program in C. – Daniel Pryden Nov 27 '17 at 02:19

0 Answers0