6

I am trying to read the output of a subprocess called from Python. To do this I am using Popen (because I do not think it is possible to pipe stdout if using subprocess.call).

As of now I have two ways of doing it which, in testing, seem to provide the same results. The code is as follows:

with Popen(['Robocopy', source, destination, '/E', '/TEE', '/R:3', '/W:5', '/log+:log.txt'], stdout=PIPE) as Robocopy:
    for line in Robocopy.stdout:
        line = line.decode('ascii')
        message_list = [item.strip(' \t\n').replace('\r', '') for item in line.split('\t') if item != '']
        print(message_list[0], message_list[0])
    Robocopy.wait()
    returncode = Robocopy.returncode

and

with Popen(['Robocopy', source, destination, '/E', '/TEE', '/R:3', '/W:5', '/log+:log.txt'], stdout=PIPE, universal_newlines=True, bufsize=1) as Robocopy:
    for line in Robocopy.stdout:
        message_list = [item.strip() for item in line.split('\t') if item != '']
        print(message_list[0], message_list[2])
    Robocopy.wait()
    returncode = Robocopy.returncode

The first method does not include universal_newlines=True, as the documentation states this is only usable if universal_newlines=True i.e., in a text mode.

The second version does include universal_newlines and therefore I specify a bufsize.

Can somebody explain the difference to me? I can't find the article but I did read about issues with an overflowing buffer causing some sort of issue and thus the importance of using for line in stdout.

Additionally, when looking at the output, not specifying universal_newlines makes stdout a bytes object - but I am not sure what difference that makes if I just decode the bytes object with ascii (in terms of new lines and tabs) compared universal_newlines mode.

Lastly, setting the bufsize to 1 makes the output "line-buffered" but I am not sure what that means. I would appreciate an explanation about how these various elements tie together. Thanks

Startec
  • 12,496
  • 23
  • 93
  • 160

1 Answers1

18

What is the difference between using universal_newlines=True (with bufsize=1) and using default arguments with Popen

The default values are: universal_newlines=False (meaning input/output is accepted as bytes, not Unicode strings plus the universal newlines mode handling (hence the name of the parameter. Python 3.7 provides text alias that might be more intuitive here) is disabled -- you get binary data as is (unless POSIX layer on Windows messes it up) and bufsize=-1 (meaning the streams are fully buffered -- the default buffer size is used).

universal_newlines=True uses locale.getpreferredencoding(False) character encoding to decode bytes (that may be different from ascii encoding used in your code).

If universal_newlines=False then for line in Robocopy.stdout: iterates over b'\n'-separated lines. If the process uses non-ascii encoding e.g., UTF-16 for its output then even if os.linesep == '\n' on your system; you may get a wrong result. If you want to consume text lines, use the text mode: pass universal_newlines=True or use io.TextIOWrapper(process.stdout) explicitly.

The second version does include universal_newlines and therefore I specify a bufsize.

In general, It is not necessary to specify bufsize if you use universal_newlines (you may but it is not required). And you don't need to specify bufsize in your case. bufsize=1 enables line-bufferred mode (the input buffer is flushed automatically on newlines if you would write to process.stdin) otherwise it is equivalent to the default bufsize=-1.

jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • So the default value of `bufsize=-1` seems like it could cause an issue. Fully buffering the stream could cause some sort of blocking correct? And if it is not specified (in universal_newlines mode) again don't I create the possibility of blocking because of a full buffer? – Startec Jul 04 '16 at 10:53
  • @Startec: no. There is no blocking issue here (regardless of `universal_newlines`, `bufsize` values). Where do you get these ideas? If you have an issue with some specific code then ask the question about this specific code. – jfs Jul 04 '16 at 10:56
  • My apologies, my confusion came from the documentation for subprocess (i.e. `Do not use stdout=PIPE or stderr=PIPE with this function. The child process will block if it generates enough output to a pipe to fill up the OS pipe buffer as the pipes are not being read from. )` But I now see that is for subprocess.call. Thanks for your clear answer - it addresses my question(s) – Startec Jul 04 '16 at 11:03
  • 1
    @Startec: yes. if you don't read from `process.stdout` pipe (when `stdout=PIPE`) then the child process may block i.e., OS pipe buffer may be finite (at least on some systems) and therefore as soon as the child process fills it; it won't be able to write anymore (untill you drain the buffer by reading from your end of the pipe). Note: OS pipe buffer is **outside** your parent Python script; it has nothing to do with `bufsize` (that control the buffer **inside** the parent Python script), [look at the picture](http://stackoverflow.com/a/31953436/4279) – jfs Jul 04 '16 at 11:11