0

I'm using subprocess.Popen to run different scripts from another python script and some of them just hang in some point until they are killed by the scheduler script. I tried setting bufsize to 100 * 1024 * 1024 but didn't succeeded solving the problem. If I ignore all the output setting stdoutand stderr to subprocess.DEVNULL it doesn't get stuck.

Example:

# For most scripts this works, but for long running and very verbose scripts, it gets stuck
subprocess.Popen(args=args_list, bufsize=100*1024*1024, stderr=subprocess.PIPE, stdout=subprocess.DEVNULL)

# This works fine, but I don't get the stderr content
subprocess.Popen(args=args_list, stderr=subprocess.DEVNULL, stdout=subprocess.DEVNULL)

Thanks for your help!

  • Do you actually communicate with the subprocess, e.g. by reading stdout/stderr or using ``.communicate``? – MisterMiyagi Aug 26 '20 at 18:49
  • Yes! After a while I call `.communicate()` and select the second element from the returned list. I just need this so I can send an alarm in case of failure with the error to the script maintainer – Yago Carvalho Aug 27 '20 at 17:39

1 Answers1

0

Large buffers should only be configured for performance, and never relied on as a correctness measure.

That means that using a large bufsize is not a suitable replacement for ensuring that you can handle reading from a subprocess's stdout and stderr in whichever order contents become available. An operating system isn't guaranteed to support arbitrarily-sized I/O buffers; you should never assume that just because you request a buffer of a given size, you're going to actually receive it.

On Linux, for example, some kernel versions support the F_SETPIPE_SZ ioctl to request a given pipe buffer size. However, unprivileged processes can't actually set to this to any value larger than the fs.pipe-max-size sysctl, so this request may fail; your program needs to be prepared for such a failure.


Thus, use the general best-practices for reading from a subprocess, no matter what your buffer size is.

That means, in something like a rough order of preference:

  • Using subprocess.communicate() wherever you can.
  • Combining stdout and stderr into a single file descriptor, if you don't actually need to care about the distinction between them.
  • Reading from both descriptors, either asynchronously or simultaneously through threading, to ensure that writes that happen will be handled in the order they came in.

As some existing Q&A entries that will help, see:

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441