Unbuffered Python Subrocess PIPE

Question

I have been trying to implement a wrapper around subprocess as follows:

def ans_cmd_stream_color(inputcmd):
"""Driver function for local ansible commands.

Stream stdout to stdout and log file with color.
Runs <inputcmd> via subprocess.
Returns return code, stdout, stderr as dict.
"""
fullcmd = inputcmd
create_debug('Enabling colorful ansible output.', LOGGER)
create_info('Running command: ' + fullcmd, LOGGER, True)
p = subprocess.Popen('export ANSIBLE_FORCE_COLOR=true; ' + fullcmd,
                     stdout=subprocess.PIPE,
                     stderr=subprocess.PIPE,
                     shell=True)
stdout_l = []
stderr_l = []
rcode = 0
# Regex black magic
ansi_escape = re.compile(r'\x1b[^m]*m')
# Get the unbuffered IO action going.
try:
    # Non blocking
    reads = [p.stdout.fileno(), p.stderr.fileno()]
    ret = select.select(reads, [], [])
    # Print line by line
    while True:
        for fd in ret[0]:
            if fd == p.stdout.fileno():
                line = p.stdout.readline()
                sys.stdout.write(line.encode('utf-8'))
                stdout_l.append(ansi_escape.sub('',
                                                line.encode('utf-8'))
                                )
            if fd == p.stderr.fileno():
                line = p.stdout.readline()
                sys.stderr.write(line.encode('utf-8'))
                stderr_l.append(ansi_escape.sub('',
                                                line.encode('utf-8'))
                                )
        # Break when the process is done.
        if p.poll() is not None:
            rcode = p.returncode
            break
except BaseException as e:
    raise e
outstr = ''.join(stdout_l)
errstr = ''.join(stderr_l)
outstr, errstr = str(outstr).rstrip('\n'), str(errstr).rstrip('\n')
expstr = errstr.strip('ERROR: ')
if len(expstr) >= 1:
    create_info('Command: ' + str(fullcmd) + ': ' + expstr + '\n', LOGGER,
                True)
    if rcode == 0:
        rcode = 1
else:
    create_info(outstr + '\n', LOGGER)
    if rcode == 0:
        create_info('Command: ' + fullcmd + ' ran successfully.', LOGGER,
                    True)
    expstr = False
ret_dict = {inputcmd: {}}
ret_dict[inputcmd]['rcode'] = rcode
ret_dict[inputcmd]['stdout'] = outstr
ret_dict[inputcmd]['stderr'] = expstr
return copy.deepcopy(ret_dict)

The idea is to print a streaming output of subprocess command and then return info to the function user. The issue is that even using a direct io.open, the subprocess PIP is still buffered unless I set:

os.environ["PYTHONUNBUFFERED"] = "1"

Which is not ideal. Any ideas or has anybody encountered this issue?

UPDATE: With ansible you need to disable buffering for subprocess to honor buffering settings:

def ans_cmd_stream_color(inputcmd):
"""Driver function for local ansible commands.

Stream stdout to stdout and log file with color.
Runs <inputcmd> via subprocess.
Returns return code, stdout, stderr as dict.
"""
fullcmd = inputcmd
create_debug('Enabling colorful ansible output.', LOGGER)
create_info('Running command: ' + fullcmd, LOGGER, True)
p = subprocess.Popen('export ANSIBLE_FORCE_COLOR=true; ' +
                     'export PYTHONUNBUFFERED=1; ' + fullcmd,
                     stdout=subprocess.PIPE,
                     stderr=subprocess.PIPE,
                     shell=True)
stdout_l = []
stderr_l = []
rcode = 0
# Regex black magic
ansi_escape = re.compile(r'\x1b[^m]*m')
# Get the unbuffered IO action going.
try:
    # Non blocking
    reads = [p.stdout.fileno(), p.stderr.fileno()]
    ret = select.select(reads, [], [])
    # Print line by line
    while True:
        for fd in ret[0]:
            if fd == p.stdout.fileno():
                line = p.stdout.readline()
                sys.stdout.write(line.encode('utf-8'))
                stdout_l.append(ansi_escape.sub('',
                                                line.encode('utf-8'))
                                )
            if fd == p.stderr.fileno():
                line = p.stdout.readline()
                sys.stderr.write(line.encode('utf-8'))
                stderr_l.append(ansi_escape.sub('',
                                                line.encode('utf-8'))
                                )
        # Break when the process is done.
        if p.poll() is not None:
            rcode = p.returncode
            break
except BaseException as e:
    raise e
outstr = ''.join(stdout_l)
errstr = ''.join(stderr_l)
outstr, errstr = str(outstr).rstrip('\n'), str(errstr).rstrip('\n')
expstr = errstr.strip('ERROR: ')
if len(expstr) >= 1:
    create_info('Command: ' + str(fullcmd) + ': ' + expstr + '\n', LOGGER,
                True)
    if rcode == 0:
        rcode = 1
else:
    create_info(outstr + '\n', LOGGER)
    if rcode == 0:
        create_info('Command: ' + fullcmd + ' ran successfully.', LOGGER,
                    True)
    expstr = False
ret_dict = {inputcmd: {}}
ret_dict[inputcmd]['rcode'] = rcode
ret_dict[inputcmd]['stdout'] = outstr
ret_dict[inputcmd]['stderr'] = expstr
return copy.deepcopy(ret_dict)

1- don't put the solution into the question, post it as an answer instead (to allow commenting, voting on the solution). 2- `select()` call in your code is blocking. The `select()` loop could be simpler [Run command and get its stdout, stderr separately in near real time like in a terminal](http://stackoverflow.com/a/31953436/4279). If it is ok to merge stdout/stderr then [the code can be even simpler](http://stackoverflow.com/a/17698359/4279). 3- Are you asking how to unbuffer *internal buffers* in an arbitrary subprocess? (it is impossible in the general case. Though `-u` works for `python`) — jfs, May 10 '16 at 18:55
unrelated: to pass an envvar to the child, use `Popen(..., env=dict(os.environ, ANSIBLE_FORCE_COLOR='true', PYTHONUNBUFFERED='1'))` — jfs, May 10 '16 at 18:57

score -1 · Accepted Answer · edited May 23 '17 at 11:52

-1

You should probably directly read from the subprocess pipes. Something like the following will read from the standard out to the information logger and the standard error to the error logger.

import logging, subprocess
logging.basicConfig(level=logging.INFO)
proc = subprocess.Popen(
        cmd, stdout=subprocess.PIPE,
        stderr=subprocess.PIPE
    )
cont = True
while cont:
    cont = False
    line = proc.stdout.readline()
    if not line == b"":
        out = line.decode("utf-8").rstrip()
        logging.info(out)
        cont = True

    line = proc.stderr.readline()
    if not line == b"":
        out = line.decode("utf-8").rstrip()
        logging.error(out)
        cont = True

    if not cont and proc.poll() is not None:
        break

To address the buffering issue, per this question, either the subordinate Python script must explicitly flush the buffers, or the environment variable PYTHONUNBUFFERED must be set to a non-empty string.

edited May 23 '17 at 11:52

Community

1
1

answered May 09 '16 at 19:48

Keith Prussing

803
8
19

Unfortunately reading directly from the pipe also buffers and does not resolve the issue of getting a line by line output. – Bryan Apperson May 09 '16 at 20:02
The interleaving of the output can be handled by not skipping the rest of the loop like I originally had. You have the buffer=0 in the original which should do the trick. Are you sure the external process is not buffering? – Keith Prussing May 09 '16 at 20:11
Yes, 100% - `os.environ["PYTHONUNBUFFERED"] = "1"` resolves the issue. I was originally using `proc.stderr.readline()` but then switched to the direct `io.open` method to debug where input is buffered. I realized there is buffering occurring on the subprocess.PIPE file object itself. From the comments in subprocess: **bufsize, if given, has the same meaning as the corresponding argument to the built-in open() function: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size. A negative bufsize means to use the system default** – Bryan Apperson May 09 '16 at 20:18
It would seem this is not being honored for one reason or another. – Bryan Apperson May 09 '16 at 20:20
And that's where you have me stumped. In the past, I've worked around issues like this by using `buffer=0`, or, if that didn't work, making sure the slave command was not buffered. – Keith Prussing May 09 '16 at 20:26
Looking around, I came across http://stackoverflow.com/questions/230751/how-to-flush-output-of-python-print question and the associated answers. It appears like the subordinate script (if Python) will need to explicitly flush the buffers in order to be unbuffered by the calling script. Otherwise, you have to use `os.environ["PYTHONUNBUFFERED"]="1"`. – Keith Prussing May 09 '16 at 20:38
That was the trick, to export the environment value to the shell in which the command was being called. – Bryan Apperson May 09 '16 at 20:54

Unbuffered Python Subrocess PIPE

1 Answers1

Linked