python replacing shell pipeline

Question

About the example found here

output=`dmesg | grep hda`

becomes:

p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close()  # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]

How to expand it to a 3-process pipeline?

Also, I could not find, what the p1.stdout.close() does exactly? What if p1 runs for a long time? will it wait for p1 to finish before applying the close() ?

Must close() come before communicate() ?

Do I understand correctly that before I call communicate(), the pipeline is set up, but "on hold"? Or rather, each process is started parallel right away, but the ones requiring input from the stdin are blocking until communicate() is called?

Consider:

output=`dmesg | grep hda | grep bla`

Maybe something like this:

p1 = Popen(["dmesg"], stdout=PIPE)
p1.stdout.close()  # Allow p1 to receive a SIGPIPE if p2 exits.

p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p2.stdout.close()  # Allow p2 to receive a SIGPIPE if p3 exits.

p3 = Popen(["grep", "bla"], stdin=p2.stdout, stdout=PIPE)

output = p3.communicate()[0]

(the one above crashes with ValueError: I/O operation on closed file in the current form)

This one does not throw an error, but since I don't understand the close(), it may be a setup for doom sometime later:

p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)

p1.stdout.close()  # Allow p1 to receive a SIGPIPE if p2 exits.

p3 = Popen(["grep", "bla"], stdin=p2.stdout, stdout=PIPE)

p2.stdout.close()  # Allow p2 to receive a SIGPIPE if p3 exits.

output = p3.communicate()[0]

Or this:

p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p3 = Popen(["grep", "bla"], stdin=p2.stdout, stdout=PIPE)

p1.stdout.close()  # Allow p1 to receive a SIGPIPE if p2 exits.
p2.stdout.close()  # Allow p2 to receive a SIGPIPE if p3 exits.

output = p3.communicate()[0]

score 0 · Answer 1 · answered Jun 12 '18 at 13:48

0

According to python documentation item 17.1.4.2 of https://docs.python.org/2/library/subprocess.html, calling p1.stdout.close() after starting p2 is important in order for p1 to receive a SIGPIPE if p2 exits before p1. Likewise, the p2.stdout.close() should be called after starting the p3 too.

How to handle a broken pipe (SIGPIPE) in python? explains how to handle a broken pipe (SIGPIPE).

answered Jun 12 '18 at 13:48

Marco Aurélio Falcão

61
7

This answer is not very clear on how should I solve the problem. I have read the docs, and it is not very clear to me, what happens in the background, when I call these methods. I see many possible solutions that conform with the guidelines provided in the docs, this is why I am looking for an answer (preferably code sample) that not only works but also won't come back later with an obscure, every 2nd Sunday-error. – Zoltan K. Jun 12 '18 at 14:06
Pipes are used to communicate between processes. If p1 writes data to p2 and p2 is not there anymore, then a sign SIGPIPE will be generated to p1 and p1 will be able to handle this. Python recommends what I have answered in order to guarantee this behavior. – Marco Aurélio Falcão Jun 12 '18 at 14:20
I'm still not clear on the actionable form of your answer. Are you saying, if I have a pipeline of N processes, p1 | p2 | p3 |...| pN, in python/subprocess I should put a p1.stdout.close() anywhere I prefer, but after Popen(p2)? I can put it before communicate(), after communicate(), it does not matter? I can even call p3.stdout.close() before p2.stdout.close(), it will be ok? Which placement do You use? – Zoltan K. Jun 12 '18 at 15:05
You ve suggested two alternatives. I would use the first one. I am new to Python. I have experienced programming processes and pipes with C. If you knew processes in Python maybe you could create two processes p1 and p2. p1 loops writing data to p2 through pipe. p2 reads data from p1. After a time set by timer p2 exits. p1 would try to send data to p2 and p2 would not be there to receive. An exception should be raised like SIGPIPE. – Marco Aurélio Falcão Jun 12 '18 at 16:26

python replacing shell pipeline

1 Answers1