2

This question helped me understand the difference between redirection and piping, but the examples focus on redirecting STDOUT (echo foo > bar.txt) and piping STDIN (ls | grep foo).

It would seem to me that any command that could be written my_command < file.txt could also be written cat file.txt | my_command. In what situations are STDIN redirection necessary?

Apart from the fact that using cat spawns an extra process and is less efficient than redirecting STDIN, are there situations in which you have to use the STDIN redirection? Put another way, is there ever a reason to pipe the output of cat to another command?

Quality Catalyst
  • 6,531
  • 8
  • 38
  • 62
Chap
  • 3,649
  • 2
  • 46
  • 84
  • 1
    Redirecting standard input is certainly *preferable* to the pipe version, since it doesn't spawn an unnecessary process. A better question would be when is the *pipe* necessary. – chepner Jan 25 '18 at 16:18
  • @chepner - a pipe is certainly necessary when a *command* is generating the data that's being read on STDIN by `my_command`. And I can see how redirecting a file to STDIN (or is it "redirecting STDIN to a file"?) is more efficient. What I want to know is whether there are cases in which you can't simply pipe a file to STDIN, and have to use the STDIN-redirect method. How might I better express that in my question? – Chap Jan 25 '18 at 16:31
  • Maybe the question should be: are `cat foo.txt | my_cmd` and `my_cmd < foo.txt` effectively synonymous? (Apart from the issue of preferability) – Chap Jan 25 '18 at 16:37
  • 1
    Redirection is never *necessary*, but your question implies you think the pipe is better and redirection should be avoided when possible. The exact opposite is true: you should use redirection when possible, and only use a pipe when necessary. The pipe uses `cat` to open the file for reading when the shell is perfectly capable of opening it on its own. – chepner Jan 25 '18 at 16:43
  • @chepner I didn't mean to imply I thought it was "better." - I'd literally never thought about it before. I found myself `cat`-ing a file into a command and wondered why I have always done that (and seen it done) instead of using STDIN redirection. It's been my experience that Unix doesn't often give one "more than one way to do it." Now all I have to do is learn new muscle memory. – Chap Jan 25 '18 at 16:53
  • 2
    If your program tries to `seek()` around or read a part of its input file more than once, it **can't** use the pipe. This means that, for example, versions of `sort` that can parallelize by starting multiple processes that handle different chunks of the input file can't offer that functionality at all when reading from `cat` (without, at least, reading everything from the FIFO into a temporary file first, rather than just being able to have each thread/subprocess seek() directly to a different piece of the input). – Charles Duffy Jan 25 '18 at 16:53
  • 3
    Consider `wc -c` for another example -- given a handle on a real file it can just use a `stat()`-family call to get the file's length in constant time no matter how long it takes. Given a pipe, it has to read the whole thing to the end and count bytes, so we're not just talking FIFO overhead but gotta-use-an-entirely-different-algorithm overhead. – Charles Duffy Jan 25 '18 at 16:55
  • (err, "no matter how long it takes" should have been "no matter how long that file is", of course). – Charles Duffy Jan 25 '18 at 17:09
  • Stack Overflow is a site for programming and development questions. This question appears to be off-topic because it is not about programming or development. See [What topics can I ask about here](http://stackoverflow.com/help/on-topic) in the Help Center. Perhaps [Super User](http://superuser.com/) or [Unix & Linux Stack Exchange](http://unix.stackexchange.com/) would be a better place to ask. – jww Jan 25 '18 at 19:14

2 Answers2

4

What's the difference between my_command < file.txt and cat file.txt | my_command?

my_command < file.txt 

The redirection symbol can also be written as 0< as this redirects file descriptor 0 (stdin) to connect to file.txt instead of the current setting, which is probably the terminal. If my_command is a shell built-in then there are NO child processes created, otherwise there is one.

cat file.txt | my_command

This redirects file descriptor 1 (stdout) of the command on the left to the input stream of an anonymous pipe, and file descriptor 0 (stdin) of the command on the right to the output stream of the anonymous pipe.

We see at once that there is a child process, since cat is not a shell built-in. However in bash even if my_command is a shell builtin it is still run in a child process. Therefore we have TWO child processes.

So the pipe, in theory, is less efficient. Whether that difference is significant depends on many factors, including the definition of "significant". The time when a pipe is preferable is this alternative:

command1 > file.txt
command2 < file.txt

Here it is likely that

command1 | command2

is more efficient, remembering that, in practice, we will probably need a third child process in rm file.txt.

However, there are limitations to pipes. They are not seekable (random access, see man 2 lseek) and they cannot be memory mapped (see man 2 mmap). Some applications map files to virtual memory, but it would be unusual to do that to stdin or stdout. Memory mapping in particular is not possible on a pipe (whether anonymous or named) because a range of virtual addresses has to be reserved and for that a size is required.

Edit:

As mentioned by @JohnKugelman, a common error and source of many SO questions is the associated issue with a child process and redirection:

Take a file file.txt with 99 lines:

i=0
cat file.txt|while read
do
   (( i = i+1 ))
done

echo "$i"

What gets displayed? The answer is 0. Why? Because the count i = i + 1 is done in a subshell which, in bash, is a child process and does not change i in the parent (note: this does not apply to korn shell, ksh).

while read
do
   (( i = i+1 ))
done < file.txt

echo "$i"

This displays the correct count because no child processes are involved.

cdarke
  • 42,728
  • 8
  • 80
  • 84
  • This helped by reminding me that STDIN can sometimes be attached to, and treated as, a _file_. I'm so used to thinking of it as a terminal, or the receiving end of a pipe, that I'd forgotten that. Thanks for a thorough answer. – Chap Jan 25 '18 at 17:19
1

You can of course replace any use of input redirection with a pipe that reads from cat, but it is inefficient to do so, as you are spawning a new process to do something the shell can already do by itself. However, not every instance of cat ... | my_command can be replaced with my_command < ..., namely when cat is doing its intended job of concatenating two (or more) files, it is perfectly reasonable to pipe its output to another command.

cat file1.txt file2.txt | my_command
chepner
  • 497,756
  • 71
  • 530
  • 681
  • I disagree that this is true of *any* redirection without qualification. It's rare for programs to `fstat` or `seek` in their stdin (except as a performance optimization), but not impossible. Trying to remember the name of a standard GNU tool that, if given a pipe, copies the pipe's whole contents to a temporary file to have a seekable source before it's able to start its work... `shuf`, maybe? – Charles Duffy Jan 25 '18 at 16:56
  • @CharlesDuffy: these are exactly the situations I was interested in hearing about. – Chap Jan 25 '18 at 16:57