160

How can you diff two pipelines without using temporary files in Bash? Say you have two command pipelines:

foo | bar
baz | quux

And you want to find the diff in their outputs. One solution would obviously be to:

foo | bar > /tmp/a
baz | quux > /tmp/b
diff /tmp/a /tmp/b

Is it possible to do so without the use of temporary files in Bash? You can get rid of one temporary file by piping in one of the pipelines to diff:

foo | bar > /tmp/a
baz | quux | diff /tmp/a -

But you can't pipe both pipelines into diff simultaneously (not in any obvious manner, at least). Is there some clever trick involving /dev/fd to do this without using temporary files?

Adam Rosenfield
  • 390,455
  • 97
  • 512
  • 589

3 Answers3

173

A one-line with 2 tmp files (not what you want) would be:

 foo | bar > file1.txt && baz | quux > file2.txt && diff file1.txt file2.txt

With bash, you might try though:

 diff <(foo | bar) <(baz | quux)

 foo | bar | diff - <(baz | quux)  # or only use process substitution once

The 2nd version will more clearly remind you which input was which, by showing
-- /dev/stdin vs. ++ /dev/fd/63 or something, instead of two numbered fds.


Not even a named pipe will appear in the filesystem, at least on OSes where bash can implement process substitution by using filenames like /dev/fd/63 to get a filename that the command can open and read from to actually read from an already-open file descriptor that bash set up before exec'ing the command. (i.e. bash uses pipe(2) before fork, and then dup2 to redirect from the output of quux to an input file descriptor for diff, on fd 63.)

On a system with no "magical" /dev/fd or /proc/self/fd, bash might use named pipes to implement process substitution, but it would at least manage them itself, unlike temporary files, and your data wouldn't be written to the filesystem.

You can check how bash implements process substitution with echo <(true) to print the filename instead of reading from it. It prints /dev/fd/63 on a typical Linux system. Or for more details on exactly what system calls bash uses, this command on a Linux system will trace file and file-descriptor system calls

strace -f -efile,desc,clone,execve bash -c '/bin/true | diff -u - <(/bin/true)'

Without bash, you could make a named pipe. Use - to tell diff to read one input from STDIN, and use the named pipe as the other:

mkfifo file1_pipe.txt
foo|bar > file1_pipe.txt && baz | quux | diff file1_pipe.txt - && rm file1_pipe.txt

Note that you can only pipe one output to multiple inputs with the tee command:

ls *.txt | tee /dev/tty txtlist.txt 

The above command displays the output of ls *.txt to the terminal and outputs it to the text file txtlist.txt.

But with process substitution, you can use tee to feed the same data into multiple pipelines:

cat *.txt | tee >(foo | bar > result1.txt)  >(baz | quux > result2.txt) | foobar
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • 5
    even without bash, you can use temporary fifo's `mkfifo a; cmd >a& cmd2|diff a -; rm a` – unhammer Jun 10 '13 at 10:49
  • You can use a regular pipe for one of the args: `pipeline1 | diff -u - <(pipeline2)`. Then the output will more clearly remind you which input was which, by showing `-- /dev/stdin` vs. `++ /dev/fd/67` or something, instead of two numbered fds. – Peter Cordes Mar 05 '18 at 04:36
  • process substitution (`foo <( pipe )`) doesn't modify the filesystem. **The pipe is *anonymous*; it has no name in the filesystem**. The shell uses the `pipe` system call to create it, not `mkfifo`. Use `strace -f -efile,desc,clone,execve bash -c '/bin/true | diff -u - <(/bin/true)'` to trace file and file-descriptor system calls if you want to see for yourself. On Linux, `/dev/fd/63` is part of the `/proc` virtual filesystem; it automatically has entries for every file descriptor, and it isn't a copy of the contents. So you can't call that a "temporary file" unless `foo 3 – Peter Cordes Mar 05 '18 at 04:47
  • @PeterCordes Good points. I have included your comment in the answer for more visibility. – VonC Mar 05 '18 at 08:00
  • Why not just fix your first big paragraph, instead of leaving in the errors and only posting corrections? Note that Daniel Cassidy deleted his answer a year after posting it, presumably because it was wrong. – Peter Cordes Mar 05 '18 at 08:24
  • 1
    @PeterCordes I will leave any edit to you: that is what makes Stack Overflow interesting: anyone can "fix" an answer. – VonC Mar 05 '18 at 08:35
  • I definitely like the SO philosophy of fixing existing answers instead of always having to post new ones. Especially when it's already accepted and highly voted. – Peter Cordes Mar 08 '18 at 05:31
135

In bash you can use subshells, to execute the command pipelines individually, by enclosing the pipeline within parenthesis. You can then prefix these with < to create anonymous named pipes which you can then pass to diff.

For example:

diff <(foo | bar) <(baz | quux)

The anonymous named pipes are managed by bash so they are created and destroyed automatically (unlike temporary files).

BenM
  • 4,056
  • 3
  • 24
  • 26
  • 1
    Much more detailed than my redaction on the same solution -- anonymous batch --. +1 – VonC Dec 06 '08 at 10:38
  • 5
    This is called [process substitution](https://www.gnu.org/software/bash/manual/html_node/Process-Substitution.html) in Bash. – Franklin Yu Apr 14 '16 at 04:51
7

Some people arriving at this page might be looking for a line-by-line diff, for which comm or grep -f should be used instead.

One thing to point out is that, in all of the answer's examples, the diffs won't actually start until both streams have finished. Test this with e.g.:

comm -23 <(seq 100 | sort) <(seq 10 20 && sleep 5 && seq 20 30 | sort)

If this is an issue, you could try sd (stream diff), which doesn't require sorting (like comm does) nor process substitution like the above examples, is orders or magnitude faster than grep -f and supports infinite streams.

The test example I propose would be written like this in sd:

seq 100 | sd 'seq 10 20 && sleep 5 && seq 20 30'

But the difference is that seq 100 would be diffed with seq 10 right away. Note that, if one of the streams is a tail -f, the diff cannot be done with process substitution.

Here's a blogpost I wrote about diffing streams on the terminal, which introduces sd.

mlg
  • 1,447
  • 15
  • 19