3

Some coding experiments, (made while attempting to find a shorter answer to a coding question), led to a few interesting surprises:

seq 2 | while head -n 1 ; do : ; done

Output (hit Control-C or it'll waste CPU cycles forever):

1
^C

The same, but using a redirected input file instead of piped input:

seq 2 > two
while head -n 1 ; do : ; done < two

Output (hit Control-C):

1
2
^C

Questions:

  1. Why does the while loop not stop the way seq 2 | head -n 1 would?

  2. Why would redirected input produce more output than piped input?


The above code was tested with dash and bash on a recent Lubuntu. Both seq and head are from the coreutils (version 8.25-2ubuntu2) package.

Method to get around having to hit (Ctrl-C):

timeout .1 sh -c "seq 2 > two ; while head -n 1 ; do : ; done < two"

1
2

timeout .1 sh -c "seq 2 | while head -n 1 ; do : ; done"

1

agc
  • 7,973
  • 2
  • 29
  • 50
  • If nothing else, it's a method of distinguishing redirected input from piped input. Except the '_Control-C_' is inconvenient. – agc Jun 16 '16 at 14:03
  • Funny, and a still funnier thing is that if you use the `<<` redirect this effect disappears. – Joce Jun 16 '16 at 14:07
  • On OS X I cannot reproduce the effect. Even with `< two` I still get only `1` – Matteo Jun 16 '16 at 14:25
  • 1
    What do you think `head -n 1` does when given an empty file on its stdin? – Charles Duffy Jul 06 '16 at 02:32
  • @CharlesDuffy, do you mean 1) `head -n 1 /dev/null`, or plain old 2) `head -n` (wait for user to do something)? – agc Jul 06 '16 at 02:44
  • @Matteo, on linux `head -n 1 /dev/null ; echo $?` returns `0`, is it the same with OS X? – agc Jul 06 '16 at 02:47
  • @agc, I meant the former. `head -n 1 /dev/null` -- what's its exit status? And when the thing in the condition part of a while loop sees that exit status, what action do you expect it to take? – Charles Duffy Jul 06 '16 at 02:47
  • @CharlesDuffy, `0`. – agc Jul 06 '16 at 02:54
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/116532/discussion-between-charles-duffy-and-agc). – Charles Duffy Jul 06 '16 at 02:54
  • 2
    You may find [this answer](http://stackoverflow.com/a/13736974/77567) informative. – rob mayoff Jul 06 '16 at 03:06
  • Related question: [Pipes, how do data flow in a pipeline?](https://unix.stackexchange.com/questions/182232/pipes-how-do-data-flow-in-a-pipeline/182242#182242) – agc Jul 30 '17 at 19:58

2 Answers2

3

head -n 1, when given an empty stream on stdin, is well within its rights and specification to immediately exit with a successful exit status.

Thus:

seq 2 | while head -n 1 ; do : ; done

...can legally loop forever, as head -n 1 is not required to exit with a nonzero status and thus terminate the loop. (A nonzero exit status is only required by the standard if "an error occurred", and a file having fewer lines than are requested for output is not defined as an error).

Indeed, this is explicit:

When a file contains less than number lines, it shall be copied to standard output in its entirety. This shall not be an error.


Now, if your implementation of head, after its first invocation, (printing the contents of the first line), leaves the file pointer queued up at the beginning of the second line when it exits, (which it is absolutely not required to do), then the second loop instance will then read that second line and emit it. Again, however, this is an implementation detail which depends on whether the folks writing your head implementation chose to either:

  1. Read an aggressively large block, but only emit a subset of it. (The more efficient implementation.)
  2. Or loop character-by-character to only consume a single line.

An implementer is well within their rights to decide which of those implementations to follow based on criteria only available at runtime.


Now, let's say your head always tries to read 8kb blocks at a time. How, then, could it ever leave the pointer queued up for the second line? [* - other than seeking backwards, which some implementations do when given a file, but which is not required by the standard; thanks to Rob Mayhoff for the pointer here]

This can happen if the concurrent invocation of seq has only written and flushed a single line as of when the first read occurs.

Obviously, it's a very timing-sensitive situation -- a race condition -- and also depends on unspecified implementation details, (whether seq flushes its output between lines -- which, as seq is not specified as part of POSIX or any other standard, is completely variant between platforms).

agc
  • 7,973
  • 2
  • 29
  • 50
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • That's a lot of background, if nobody else tops it, I'll green this later. Still, this answer is a bit *political* for my tastes, what with words like "rights", "not required", and the limits of standardization. – agc Jul 06 '16 at 03:44
  • As language lawyers go, I'm of the unrepentant variety: If a behavior isn't guaranteed by a specification, it can go away with any platform change, any software upgrade, any runtime environment modification, *without that change in behavior constituting a bug on anyone's part*. Language lawyering (for a wide reading of the term encompassing normative documentation in general) is critical: It helps you understand which behaviors are promises you can trust, and which are present -- or not -- at the whims of whoever last refactored the library you're using. – Charles Duffy Jul 06 '16 at 03:52
  • I wouldn't disagree... Re "without...a bug on anyone's part", that recalls two 19th century images depicting mysterious redistributions: ['Twas Him](http://www.harpweek.com/09Cartoon/BrowseByDateCartoon-Large.asp?Month=August&Date=19), and [Get off the Earth](http://www.moillusions.com/get-off-earth-optical-illusion/). – agc Jul 06 '16 at 05:21
  • On further reflection, some reservations... This is a good answer, and _does_ answer the question, but so freely intermixes generalizations with analysis of the specific coding example, that the generalizations tend to obscure the analysis, so that even a day later repeat readings are required. It would be better if the specific analysis came first, (regarding *coreutils* v8.25-2ubuntu2), and *after* that the survey of the output permutations made possible within the intentionally unspecified lacunae of POSIX. – agc Jul 08 '16 at 03:08
0

The accepted answer is correct. head doesn't return a non-zero for the input (even no input)

But I did discover some more curiosities


I figured out a way to do it that halts correctly.

seq 10 | while head -c 4 | ifne -n false; do : ; done;

Sadly there isn't much you can do with the construct because the output of head travels over the body of the while.

One use I found was to insert a character every x bytes. (including the tail)

/> printf '12345678910' | { while head -c 2 | ifne -n false; do printf 'a'; done; }
/> 12a34a56a78a91a0a

you should probably use sed 's/.\{4\}/&a/g' instead

Here is a slightly more usefull one that will take 2 bytes of input "procees" it then put it somewhere:

printf '12345678910' | { while true; do head -c 2 < /dev/stdin | ifne -n false >> file.txt || break; done;

you should probably use split --filter instead

Another very weird use-case when you try to call head inside the while loop with /dev/stdin.

/> printf '12345678910' | { while head -c 2 | ifne -n false; do head -c 3 </dev/stdin | ifne -n false >> every3.txt || break; done > every2.txt; }
/> cat every2.txt
12670
/> cat every3.txt
345891

Which, as you can see, cycles every 2 bytes then every 3 bytes. 12 345 67 891 0

you should probably use bbe instead

you could possible use it as some kind of poor-mans progress indicator

/> printf '12345678910' | while head -c 2 | ifne -n false; do echo "2 bytes travelled" > /dev/stderr ; done > /dev/null;
2 bytes travelled
2 bytes travelled
2 bytes travelled
2 bytes travelled
2 bytes travelled
2 bytes travelled # imperfect because actually only 1 byte travelled here

you should probably use pv instead


What can you actaully do with this construct...

¯\_(ツ)_/¯

WesAtWork
  • 65
  • 7
  • I feel like there is something this construct can do that is impossible or very difficult otherwise... If you have any use-cases, let me know! – WesAtWork Apr 14 '23 at 16:28