32

(via https://stackoverflow.com/a/8624829/23582)

How does (head; tail) < file work? Note that cat file | (head;tail) doesn't.

Also, why does (head; wc -l) < file give 0 for the output of wc?

Note: I understand how head and tail work. Just not the subtleties involved with these particular invocations.

Community
  • 1
  • 1
zellyn
  • 1,393
  • 1
  • 11
  • 16
  • 3
    Why do you say `cat file | (head;tail)` doesn't? It seems to work for me. And why do you say that `(head; wc -l) < file` gives `0` for the output of `wc`? That also works for me. What system are you using? What version of Bash? Are there particular files it fails on? Do the files have fewer than 10 lines, between 10 and 20, or more than 20 lines? – Brian Campbell Dec 05 '12 at 07:29
  • Just to add to the confusion: Mac OS X 10.7.5, bash 4.2.37(2). I see the first and last 10 lines with file redirection, but only the first 10 lines with the pipeline. In *both* cases, `wc -l` is returning 0. – chepner Dec 05 '12 at 13:04
  • 1
    Ubuntu 12.10, bash 4.2.37(1) here: Both redirection and pipeline works, and `wc -l` gives me `number_of_lines_in_the_file - 10`, which I guess is expected? I didn't know about this syntax, very cool. – Linus Thiel Dec 05 '12 at 13:28
  • I was trying it out on Mac OS, with bash. – zellyn Dec 05 '12 at 20:02
  • Xfce 4.8, bash 4.2.25, pipelines sometimes work. `$ curl -s http://www.wikipedia.org/ | ( head -n 2; echo "#--#"; tail -n 1; )` works fine, but `$ yes | head -12 | cat -n | ( head -n 2; echo "#--#"; tail -n 1; )` only shows head output. It is probably the head buffer issue mentioned in the answer, because `$ yes | head -5000 | cat -n | ( head -n 2; echo "#--#"; tail -n 1; )` is works just fine. – Stephen Sep 25 '13 at 07:40

4 Answers4

20

OS X

For OS X, you can look at the source code for head and the source code for tail to figure out some of what's going on. In the case of tail, you'll want to look at forward.c.

So, it turns out that head doesn't do anything special. It just reads its input using the stdio library, so it reads a buffer at a time and might read too much. This means cat file | (head; tail) won't work for small files where head's buffering makes it read some (or all) of the last 10 lines.

On the other hand, tail checks the type of its input file. If it's a regular file, tail seeks to the end and reads backwards until it finds enough lines to emit. This is why (head; tail) < file works on any regular file, regardless of size.

Linux

You could look at the source for head and tail on Linux too, but it's easier to just use strace, like this:

(strace -o /tmp/head.trace head; strace -o /tmp/tail.trace tail) < file

Take a look at /tmp/head.trace. You'll see that the head command tries to fill a buffer (of 8192 bytes in my test) by reading from standard input (file descriptor 0). Depending on the size of file, it may or may not fill the buffer. Anyway, let's assume that it reads 10 lines in that first read. Then, it uses lseek to back up the file descriptor to the end of the 10th line, essentially “unreading” any extra bytes it read. This works because the file descriptor is open on a normal, seekable file. So (head; tail) < file will work for any seekable file, but it won't make cat file | (head; tail) work.

On the other hand, tail does not (in my testing) seek to the end and read backwards, like it does on OS X. At least, it doesn't read all the way back to the beginning of the file.

Here's my test. Create a small, 12-line input file:

yes | head -12 | cat -n > /tmp/file

Then, try (head; tail) < /tmp/file on Linux. I get this with GNU coreutils 5.97:

     1  y
     2  y
     3  y
     4  y
     5  y
     6  y
     7  y
     8  y
     9  y
    10  y
    11  y
    12  y

But on OS X, I get this:

     1  y
     2  y
     3  y
     4  y
     5  y
     6  y
     7  y
     8  y
     9  y
    10  y
     3  y
     4  y
     5  y
     6  y
     7  y
     8  y
     9  y
    10  y
    11  y
    12  y
rob mayoff
  • 375,296
  • 67
  • 796
  • 848
  • 1
    Thanks, fantastic answer! My coworker and I had come to the conclusion that it involved buffering (by doing seq 10000 to generate streams, and noticing that it cut over at the 4096'th character), but your answer is much more comprehensive, and the "checking if it's a regular file" makes sense of the difference between piping and redirecting from a file. – zellyn Dec 06 '12 at 07:00
12

the parenthesis here create a subshell which is another instance of the interpreter to run the commands that are inside, what is interesting is that the subshell acts as a single stdin/stdout combo; in this case it'll first connect stdin to head which echoes the first 10 lines and closes the pipe then the subshell connects its stdin to tail which consumes the rest and writes back the last 10 lines to stdout, but the subshell takes both outputs and writes them as its own stdout and that's why it appears combined.

it's worth mentioning that the same effect could be achieved with command grouping like { head; tail; } < file which is cheaper because it doesn't create another instance of bash.

Samus_
  • 2,903
  • 1
  • 23
  • 22
4

All of these should work as expected if the file is sufficiently large. The head command will consume a certain amount of the input (not just what it needs as it buffers it's input) and if that doesn't leave enough input for the tail command, it won't work.

Another concern is that the pipe results in both sides executing in parallel and so the producing side might cause the consuming side's head command to read a different amount every time it is run.

Compare multiple runs of the following command:

for i in `seq 1 10`; do echo "foo"; done | (head -n1; wc -l)

The wc command should see a different amount of the file every time.

When using a < to provide input it doesn't seem like this parallelism exists (presumably bash reads the whole input then passes it to the head command).

ashirley
  • 1,148
  • 1
  • 12
  • 19
  • I see the same output on each run of that command. However, by changing the number 10 to higher values, I can trigger different behavior. – zellyn Dec 05 '12 at 20:56
-2

head command display first 10(default) lines of file. And tail command display last 10(default) lines of file. Suppose if the file has only 3 lines also no problem those command will display those lines. But if you have more than 10 lines, then both command will display default 10 lines only. The default number of lines will be changed by using -n, n, +n options. (refer man page)

prabu
  • 7
  • 2
  • My question was not how head and tail work, but about the subtleties of the particular invocation. – zellyn Dec 05 '12 at 20:02