9

In a reply to Piping a file through tail and head via tee, a strange behaviour of head has been observed in the following construct when working with huge files:

#! /bin/bash
for i in {1..1000000} ; do echo $i ; done > /tmp/n

( tee >(sed -n '1,3p'        >&3 ) < /tmp/n | tail -n2 ) 3>&1 # Correct
echo '#'
( tee >(tac | tail -n3 | tac >&3 ) < /tmp/n | tail -n2 ) 3>&1 # Correct
echo '#'
( tee >(head -n3             >&3 ) < /tmp/n | tail -n2 ) 3>&1 # Not correct!?

Output:

1
2
3
999999
1000000
#
1
2
3
999999
1000000
#
1
2
3
15504
15

Question:

Why does not the last line output the same lines as the previous two lines?

codeforester
  • 39,467
  • 16
  • 112
  • 140
choroba
  • 231,213
  • 25
  • 204
  • 289

1 Answers1

9

This is because head exits as soon as it transfers three first lines. Subsequently, tee gets killed with SIGPIPE because the reading end of the "FILE" pipe it is writing to is closed, but not until it manages to output some lines to its stdout.

If you execute just this:

tee >(head -n3 >/dev/null) < /tmp/n

You will see what happens better.

OTOH, tac reads the whole file as it has to reverse it, as does sed, probably to be consistent.

spbnick
  • 5,025
  • 1
  • 17
  • 22
  • Thanks. I understand now. I can even add `cat` like this `( tee >(head -n3 >&3; cat > /dev/null ) < /tmp/n | tail -n2 ) 3>&1` to make it work. – choroba May 21 '13 at 08:56
  • You're welcome :) Although, I'd say using `sed` for that part would be clearer. – spbnick May 21 '13 at 09:01
  • 2
    Note that for files smaller than 5 lines at least some of them will get output twice. – spbnick May 21 '13 at 09:03