What's the difference between `bufsize` parameter values in Python3's Popen?

Question

I'm using subprocess.Popen to run different scripts from another python script and some of them just hang in some point until they are killed by the scheduler script. I tried setting bufsize to 100 * 1024 * 1024 but didn't succeeded solving the problem. If I ignore all the output setting stdoutand stderr to subprocess.DEVNULL it doesn't get stuck.

Example:

# For most scripts this works, but for long running and very verbose scripts, it gets stuck
subprocess.Popen(args=args_list, bufsize=100*1024*1024, stderr=subprocess.PIPE, stdout=subprocess.DEVNULL)

# This works fine, but I don't get the stderr content
subprocess.Popen(args=args_list, stderr=subprocess.DEVNULL, stdout=subprocess.DEVNULL)

Thanks for your help!

Do you actually communicate with the subprocess, e.g. by reading stdout/stderr or using ``.communicate``? — MisterMiyagi, Aug 26 '20 at 18:49
Yes! After a while I call `.communicate()` and select the second element from the returned list. I just need this so I can send an alarm in case of failure with the error to the script maintainer — Yago Carvalho, Aug 27 '20 at 17:39

Charles Duffy · Answer 1 · 2020-08-26T18:54:09.207

Large buffers should only be configured for performance, and never relied on as a correctness measure.

That means that using a large bufsize is not a suitable replacement for ensuring that you can handle reading from a subprocess's stdout and stderr in whichever order contents become available. An operating system isn't guaranteed to support arbitrarily-sized I/O buffers; you should never assume that just because you request a buffer of a given size, you're going to actually receive it.

On Linux, for example, some kernel versions support the F_SETPIPE_SZ ioctl to request a given pipe buffer size. However, unprivileged processes can't actually set to this to any value larger than the fs.pipe-max-size sysctl, so this request may fail; your program needs to be prepared for such a failure.

Thus, use the general best-practices for reading from a subprocess, no matter what your buffer size is.

That means, in something like a rough order of preference:

Using subprocess.communicate() wherever you can.
Combining stdout and stderr into a single file descriptor, if you don't actually need to care about the distinction between them.
Reading from both descriptors, either asynchronously or simultaneously through threading, to ensure that writes that happen will be handled in the order they came in.

As some existing Q&A entries that will help, see:

What's the difference between `bufsize` parameter values in Python3's Popen?

1 Answers1

Large buffers should only be configured for performance, and never relied on as a correctness measure.

Thus, use the general best-practices for reading from a subprocess, no matter what your buffer size is.