Replacing Shell Pipeline

Question

In the Python 2.7 documentation of subprocess module, I found the following snippets:

p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close()  # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]

Source : https://docs.python.org/2/library/subprocess.html#replacing-shell-pipeline

I don't understand this line : p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.

Here p1.stdout is being closed. How does it allow p1 to receive a SIGPIPE if p2 exits?

score 5 · Answer 1 · answered Jan 06 '15 at 11:22

The SIGPIPE signal is normally sent if a process attempts to write to a pipe from which no active process is looking. In the shell pipeline equivalent of your code snippet:

`dmesg | grep hda`

If the grep process for some reason terminates before dmesg is done writing output, dmesg will receive a SIGPIPE and terminate itself. This would be the expected behavior for UNIX/Linux processes (http://en.wikipedia.org/wiki/Unix_signal).

By contrast, in the Python implementation using subprocess, if p2 exits before p1 is done generating output, the SIGPIPE doesn't get sent because there is actually still a process looking at the pipe - the Python script itself (the one which created p1 and p2). More importantly, the script is looking at the pipe but not consuming its contents - the effect is that the pipe is held open indefinitely and p1 gets stuck in limbo.

Explicitly closing p1.stdout detaches the Python script from the pipe and makes it such that no process other than p2 is looking at the pipe - that way if p2 does end before p1, p1 properly gets the signal to end itself without anything artificially holding the pipe open.

Here is an alternatively worded explanation: http://www.enricozini.org/2009/debian/python-pipes/

"a pipe from which no active process is looking" and "still a process looking at the pipe" are not really precise wordings. — Dr. Jan-Philip Gehrcke, Jan 06 '15 at 13:50

Dr. Jan-Philip Gehrcke · Answer 2 · 2015-01-06T13:51:38.330

A hopefully more systematic explanation:

A pipe is an instance managed by the operating system. It has a single read end and a single write end.
Both ends can be opened by multiple processes. There is still only one pipe, though. That is, multiple processes can share the same pipe.
A process that has opened one of the ends holds a corresponding file handle. The process can actively close() it again! If a process exits, the operating system closes the corresponding file handle for you.
All involved processes can close() their file handle representing the read end of the pipe. Nothing wrong with that, this is a perfectly fine situation.
Now, if a process writes data to the write end of the pipe and the read end is not opened anymore (no process holds an open file handle for the read end), a POSIX-compliant operating system sends a SIGPIPE signal to the writing process for it to know that there is no reader anymore.

This is the standard mechanism by which the receiving program can implicitly tell the sending program that it has stopped reading. Have you ever wondered if

cat bigfile | head -n5

actually reads the entire bigfile? No, it does not, because cat retrieves a SIGPIPE signal as soon as head exits (after reading 5 lines from stdin). The important thing to appreciate: cat has been designed to actually respond to SIGPIPE (that is an important engineering decision ;)): it stops reading the file and exits. Other programs are designed to ignore SIGPIPE (on purpose, these handle this situation on their own -- this is common in networking applications).

If you keep the read end of the pipe open in your controlling process, you disable described mechanism. dmesg will not be able to notice that grep has exited.

However, your example actually is not a good one. grep hda will read the entire input. dmesg is the process that exits first.

Ujjwal · Answer 3 · 2015-01-06T14:28:37.407

From the docs:

The p1.stdout.close() call after starting the p2 is important in order for p1 to receive a SIGPIPE if p2 exits before p1.

The SIGPIPE signal is sent to a process when it attempts to write to a pipe without a process connected to the other end. When p2 is created using stdin=p1.stdout, there are two processes connected to the pipe p1.stdout: the parent python process and p2. Even when p2 closes prematurely, the parent process is still running, thus, SIGPIPE signal is not sent. p1.stdout.close() closes p1.stdout in the parent/caller process, thus leaving dmesg as the only process with that file descriptor open.

In other words, if there is no p1.stdout.close() then:

p1.stdout remains open in the parent process. If p2 exits (i.e. there is no one to read p1.stdout), p1 won't know that nobody reads p1.stdout and will continue to write to p1.stdout until the corresponding OS pipe buffer is full.
in case p2 exits prematurely, p1.stdout would still be open in the parent process, thus SIGPIPE won't be generated.

Replacing Shell Pipeline

3 Answers3

Linked