17

I have a problem piping a simple subprocess.Popen.

Code:

import subprocess
cmd = 'cat file | sort -g -k3 | head -20 | cut -f2,3' % (pattern,file)
p = subprocess.Popen(cmd,shell=True,stdout=subprocess.PIPE)
for line in p.stdout:
    print(line.decode().strip())

Output for file ~1000 lines in length:

...
sort: write failed: standard output: Broken pipe
sort: write error

Output for file >241 lines in length:

...
sort: fflush failed: standard output: Broken pipe
sort: write error

Output for file <241 lines in length is fine.

I have been reading the docs and googling like mad but there is something fundamental about the subprocess module that I'm missing ... maybe to do with buffers. I've tried p.stdout.flush() and playing with the buffer size and p.wait(). I've tried to reproduce this with commands like 'sleep 20; cat moderatefile' but this seems to run without error.

user438383
  • 5,716
  • 8
  • 28
  • 43
mathtick
  • 6,487
  • 13
  • 56
  • 101
  • ... and p2.communicate() also works but I think it may cause problems if the output is large. – mathtick Nov 05 '10 at 14:45
  • 1
    'New code' very helpful. Love that I can use the exact same piped command I used when testing in the shell. Two suggestions: 1) make plural: run_shell_commands 2) either remove, comment out, or add debug=false around print statements inside function – PeterVermont May 22 '13 at 00:42
  • 1
    Thanks. Ran into the same broken pipe issue with files over a certain size. Used your code and it works like a charm. – poof Jul 03 '13 at 18:30
  • don't put the answer in your question, post it as an answer instead. btw, the code may deadlock if any of the commands produce enough output on stderr. You should close `stdout_old` in the parent after passing it to `Popen` to allow SIGPIPE upstream (it should kill `sort` instead of producing EPIPE). See also ['yes' reporting error with subprocess communicate()](http://stackoverflow.com/q/22077881/4279) – jfs Mar 16 '14 at 14:34
  • can you reproduce the error on current Python versions: 2.7 and 3.3? – jfs Mar 16 '14 at 14:44
  • related: [Replacing shell pipeline](http://docs.python.org/2/library/subprocess.html#replacing-shell-pipeline) – jfs Mar 16 '14 at 14:45
  • when I passed stdin=subprocess.PIPE,stdout=subprocess.PIPE,stderr=subprocess.PIPE in Popen() then didn't get output and err in (o, e) = p.communicate() but if I don't pass stdin=subprocess.PIPE then getting error and output – Birbal Sain Apr 29 '21 at 15:09

5 Answers5

14

From the recipes on subprocess docs:

# To replace shell pipeline like output=`dmesg | grep hda`
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
output = p2.communicate()[0]
Paulo Scardine
  • 73,447
  • 11
  • 124
  • 153
  • 1
    The shell wasn't causing the problem but for some reason splitting the commands in the "right" place seems to fix it. Thanks! – mathtick Nov 05 '10 at 14:45
  • @mathtick: you should indeed iterate over PIPE instead of attributing a large output to some string instance, otherwise you risk an out-of-memory exception. – Paulo Scardine Nov 05 '10 at 20:16
5

This is because you shouldn't use "shell pipes" in the command passed to subprocess.Popen, you should use the subprocess.PIPE like this:

from subprocess import Popen, PIPE

p1 = Popen('cat file', stdout=PIPE)
p2 = Popen('sort -g -k 3', stdin=p1.stdout, stdout=PIPE)
p3 = Popen('head -20', stdin=p2.stdout, stdout=PIPE)
p4 = Popen('cut -f2,3', stdin=p3.stdout)
final_output = p4.stdout.read()

But i have to say that what you're trying to do could be done in pure python instead of calling a bunch of shell commands.

mdeous
  • 17,513
  • 7
  • 56
  • 60
  • 4
    I am grepping 13+ million lines return 100k+ lines of matches, sorting, cutting and taking a "head". This takes seconds in the shell. It was taking forever in python. I have tried read() and I thought I tried splitting commands but I think it's the same problem. Will post back after testing more ... – mathtick Nov 05 '10 at 14:21
  • 1
    Splitting the commands seems to have fixed it, even if I still use shell=True. – mathtick Nov 05 '10 at 14:52
1

I have been having the same error. Even put the pipe in a bash script and executed that instead of the pipe in Python. From Python it would get the broken pipe error, from bash it wouldn't.

It seems to me that perhaps the last command prior to the head is throwing an error as it's (the sort) STDOUT is closed. Python must be picking up on this whereas with the shell the error is silent. I've changed my code to consume the entire input and the error went away.

Would make sense also with smaller files working as the pipe probably buffers the entire output before head exits. This would explain the breaks on larger files.

e.g., instead of a 'head -1' (in my case, I was only wanting the first line), I did an awk 'NR == 1'

There are probably better ways of doing this depending on where the 'head -X' occurs in the pipe.

0

You don't need shell=True. Don't invoke the shell. This is how I would do it:

p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
stdout_value = p.communicate()[0] 
stdout_value   # the output

See if you face the problem about the buffer after using this?

user225312
  • 126,773
  • 69
  • 172
  • 181
  • The shell didn't seem to be causing the problem. Splitting the commands in the right place seems to have fixed it (see update). Thanks! – mathtick Nov 05 '10 at 14:47
0

try using communicate(), rather than reading directly from stdout.

the python docs say this:

"Warning Use communicate() rather than .stdin.write, .stdout.read or .stderr.read to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process."

http://docs.python.org/library/subprocess.html#subprocess.Popen.stdout

p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
output =  p.communicate[0]
for line in output:
    # do stuff
Corey Goldberg
  • 59,062
  • 28
  • 129
  • 143
  • I tried p.communicate()[0] but this did not fix the problem. Splitting the commands appropriately did (see above). I still do not really understand why this has fixed things. – mathtick Nov 05 '10 at 14:51