I have a simple python script that dies after a random amount of output.
IMHO - it should function
This is linux specific, python2 or python3 same problem; not sure about python-mac - I don't have a mac handy; And is not a python-windows issue.
The problem I believe is in Linuxes interpretation of stdio blocking or non blocking. In my view, and to a large extent I believe Pythons view STDIN and STDOUT are two different files.
This post talks about the problem in terms of C code: When non-blocking I/O is turned on for stdout, is it correct for the OS to turn it on for stdin too?
The problem : On STDIN, set the os.O_NONBLOCK option on STDIN.
Expected result:
O_NONBLOCK should apply only to the file it applied to.
Actual Result:
O_NONBLOCK is set on both STDIN and STDOUT
As a result: As your application writes to stdout (or stderr) in the standard way (for example outputing log or debug information) at some point python2, or python3 exits with an IO error at some random later time.
The code below demonstrates the problem
import sys
import fcntl
import os
def show_flags(why):
print("%s: FLAGS: 0x%08x" % (why,fcntl.fcntl( sys.stdout.fileno(), fcntl.F_GETFL )))
show_flags("startup")
f = fcntl.fcntl( sys.stdin.fileno(), fcntl.F_GETFL )
show_flags("got-stdin")
f = f | os.O_NONBLOCK
fcntl.fcntl( sys.stdin.fileno(), fcntl.F_SETFL, f )
show_flags("set-stdin")
# produce spewing output to show the error.
for x in range(0,10000):
sys.stdout.write("x=%10d -------- lots of text here to fill buffer\n" % x)
If I run: "strace -o log python test.py" - and capture the log output The relevent portion showing the error is:
(start)
fcntl(1, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
write(1, "startup: FLAGS: 0x00008002\n", 27) = 27
fcntl(0, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
fcntl(1, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
write(1, "got-stdin: FLAGS: 0x00008002\n", 29) = 29
fcntl(0, F_SETFL, O_RDWR|O_NONBLOCK|O_LARGEFILE) = 0
fcntl(1, F_GETFL) = 0x8802 (flags O_RDWR|O_NONBLOCK|O_LARGEFILE)
write(1, "set-stdin: FLAGS: 0x00008802\n", 29) = 29
write(1, "x= 0 -------- lots of te"..., 55) = 55
write(1, "x= 1 -------- lots of te"..., 55) = 55
After some random number of writes (300 to 500) linux returns with an error
write(1, "x= 495 -------- lots of te"..., 55) = 55
write(1, "x= 496 -------- lots of te"..., 55) = -1 EAGAIN (Resource temporarily unavailable)
write(2, "Traceback (most recent call last"..., 35) = -1 EAGAIN (Resource temporarily unavailable)
Suggestions?
I can't easily go inside the middle of a rather large application that likes to spew debug log output...
Problem: The application wants/needs to monitor/read STDIN in a non-blocking way so that it can process output from the parent and act upon it. Writing a log should not cause the app to die at random places.
Wrapping every log statement with "try/catch" blocks is insanity.
Asking the linux gods to please change this behavior is not going to happen.
Python3 - appears to retry at least one time, which sometimes succeeds but mostly does not - Python2 does not retry at all.
Fixing this INSIDE python via a custom hack is also insanity... I cannot easily distribute my custom version of Python.EXE to every body who needs to run my application.
Suggestions?