7

I'm trying to pass information into a program that doesn't accept input from stdin. To do this, I'm using the /dev/stdin as an argument and then trying to pipe in my input. I've noticed that if I do this with a pipe character:

[pkerp@comp ernwin]$ cat fess/structures/168d.pdb | MC-Annotate /dev/stdin

I get no output. If, however, I do the same thing using the left caret character, it works fine:

[pkerp@plastilin ernwin]$ MC-Annotate /dev/stdin < fess/structures/168d.pdb
Residue conformations -------------------------------------------
A1 : G C3p_endo anti
A2 : C C3p_endo anti
A3 : G C3p_endo anti

My question is, what is difference between these two operations and why do they give a different outcome? As a bonus question, is there a proper term for specifying input using the '<' symbol?

Update:

My current best guess is that something internal to the program being run makes use of seeking within the file. The answers below seem to suggest that it has something to do with the file pointers but running the following little test program:

#include <stdio.h>

int main(int argc, char *argv[])
{   
    FILE *f = fopen(argv[1], "r");
    char line[128];

    printf("argv[1]: %s f: %d\n", argv[1], fileno(f));

    while (fgets(line, sizeof(line), f)) {
    printf("line: %s\n", line);
    }

    printf("rewinding\n");
    fseek(f, 0, SEEK_SET);

    while (fgets(line, sizeof(line), f)) {
    printf("line: %s\n", line);
    }
    fclose(f);
}

indicates that everything occurs identically up until the fseek function call:

[pete@kat tmp]$ cat temp | ./a.out /dev/stdin
argv[1]: /dev/stdin f: 3
line: abcd

rewinding
===================
[pete@kat tmp]$ ./a.out /dev/stdin < temp
argv[1]: /dev/stdin f: 3
line: abcd

rewinding
line: abcd

Using process substitution as Christopher Neylan suggested leads to the program above hanging without even reading the input, which also seems a little strange.

[pete@kat tmp]$ ./a.out /dev/stdin <( cat temp )
argv[1]: /dev/stdin f: 3

Looking at the strace output confirms my suspicion that a seek operation is attempted which fails in the pipe version:

_llseek(3, 0, 0xffffffffffd7c7c0, SEEK_CUR) = -1 ESPIPE (Illegal seek)

And succeeds in the redirect version.

_llseek(3, 0, [0], SEEK_CUR)            = 0 

The moral of story: don't haphazardly try to replace an argument with /dev/stdin and try to pipe to it. It might work, but it just as well might not.

juniper-
  • 6,262
  • 10
  • 37
  • 65
  • Check out the information over here: http://stackoverflow.com/questions/1312922/detect-if-stdin-is-a-terminal-or-pipe-in-c-c-qt – Atle May 10 '13 at 12:54
  • That's interesting, but it doesn't appear to be doing such a check. And regardless, shouldn't the output of isatty() be the same in both of the input cases mentioned here? This being in contrast to the linked post which has no redirected input. – juniper- May 10 '13 at 13:01
  • According to the second answer | returns YES for isatty while < returns NO. What MC-Annotate does, I cannot say. http://stackoverflow.com/a/7601564/2269047 – Atle May 10 '13 at 13:31
  • This also might be interesting: http://stackoverflow.com/questions/1563882/reading-a-file-name-from-piped-command – Atle May 10 '13 at 13:32

3 Answers3

2

There should be no functional difference between those two commands. Indeed, I cannot recreate what you're seeing:

#! /usr/bin/perl
# test.pl
# this is a test Perl script that will read from a filename passed on the command line, and print what it reads.

use strict;
use warnings;

print $ARGV[0], " -> ", readlink( $ARGV[0] ), " -> ", readlink( readlink($ARGV[0]) ), "\n";
open( my $fh, "<", $ARGV[0] ) or die "$!";
while( defined(my $line = <$fh>) ){
        print "READ: $line";
}
close( $fh );

Running this the three ways:

(caneylan@faye.sn: tmp)$ cat input
a
b
c
d

(caneylan@faye.sn: tmp)$ ./test.pl /dev/stdin
/dev/stdin -> /proc/self/fd/0 -> /dev/pts/0
this is me typing into the terminal
READ: this is me typing into the terminal

(caneylan@faye.sn: tmp)$ cat input | ./test.pl /dev/stdin
/dev/stdin -> /proc/self/fd/0 -> pipe:[1708285]
READ: a
READ: b
READ: c
READ: d

(caneylan@faye.sn: tmp)$ ./test.pl /dev/stdin < input
/dev/stdin -> /proc/self/fd/0 -> /tmp/input
READ: a
READ: b
READ: c
READ: d

First note what /dev/stdin is:

(caneylan@faye.sn: tmp)$ ls -l /dev/stdin
lrwxrwxrwx 1 root root 15 Apr 21 15:39 /dev/stdin -> /proc/self/fd/0

(caneylan@faye.sn: tmp)$ ls -l /proc/self
lrwxrwxrwx 1 root root 0 May 10 09:44 /proc/self -> 27565

It's always a symlink to /proc/self/fd/0. /proc/self is itself a special link to the directory under /proc for the current process. So /dev/stdin will always point to fd 0 of the current process. So when you run MC-Annotate (or, in my examples, test.pl), the file /dev/stdin will resolve to /proc/$pid/fd/0, for whatever the process ID of MC-Annotate is. This is just a result of how the symlink for /dev/stdin works.

So as you can see above in my example, when you use a pipe (|), /proc/self/fd/0 will point to the read end of the pipe from cat set up by the shell. When you use a redirection (<), /proc/self/fd/0 will point directly to the input file, as set up by the shell.

As to why you're seeing this odd behavior--I'd guess that MC-Annotate is doing some checks on the filetype before opening it and it's seeing that /dev/stdin is pointing to a named pipe instead of a regular file, and is bailing out. You could confirm this by either reading the source-code for MC-Annotate or using the strace command to watch what's happening internally.

Note that both of these methods are a bit round-about in Bash. The accepted way to get the output of a process into a program that will only open a filename is to use process substitution:

$ MC-Annotate <(cat fess/structures/168d.pdb)

The <(...) construct returns a file descriptor to the read-end of a pipe that's coming from whatever the ... is:

(caneylan@faye.sn: tmp)$ echo <(true | grep example | cat)
/dev/fd/63
Christopher Neylan
  • 8,018
  • 3
  • 38
  • 51
1

The problem lies in the order in which files are opened for reading.

/dev/stdin is not a real file; it's a symlink to the file which the current process uses as standard input. In a typical shell, it is linked to the terminal, and inherited by any process started by the shell. Keep in mind that MC-Annotate will only read from the file provided as an argument.

In the pipe example, /dev/stdin is a symlink to the file which MC-Annotate inherits as standard input: the terminal. It probably opens this file on a new descriptor (let's say 3, but it could be any value greater than 2). The pipe connects the output of cat to MC-Annotate's standard input (file descriptor 0), which MC-Annotate continues to ignore in favor of the file it opened directly.

In the redirection example, the shell connects fess/structures/168d.pdb directly to file descriptor 0 before MC-Annotate is run. When MC-Annotate starts up, it again tries to open /dev/stdin, which this time points to fess/structures/168d.pdb instead of the terminal.

So the answer lies in which file /dev/stdin is a link to in the process that executes MC-Annotate; shell redirections are set up before the process starts; pipelines after the process starts.

Does this work?

cat fess/structures/168d.pdb | MC-Annotate <( cat /dev/stdin )

A similar command

echo foo | cat <( cat /dev/stdin )

seems to work, but I won't claim the situations are identical.


[ UPDATE: does not work. /dev/stdin is still a link to the terminal, not the pipeline.]

This might provide a work-around. Now, MC-Annotate inherits its standard input from the subshell, not the current shell, and the subshell has output from cat as its standard input, not the terminal.

cat fess/structures/168d.pdb | ( MC-Annotate /dev/stdin )

It think a simple command group will work as well:

cat fess/structures/168d.pdb | { MC-Annotate /dev/stdin; }
chepner
  • 497,756
  • 71
  • 530
  • 681
  • Wow! Great description. Any way to trick it into not opening a new file descriptor when receiving piped input? – juniper- May 10 '13 at 14:02
  • See my update. `MC-Annotate` is still going to open a new file descriptor; I don't think the program is doing any kind of internal detection: it just opens the file that is presented on the command line. However, my update shows a way to run `MC-Annotate` in an environment where it inherits something other than the terminal as its standard input. – chepner May 10 '13 at 14:10
  • 1
    this isn't correct. the shell evaluates the pipelines *before* it sets up redirections. this is why `(echo stdout ; echo stderr 1>&2) 2>&1 | grep stdout` only prints "stdout". if redirections were set up first, as suggested here, then that command would print "stdout" and "stderr" because the dup2() call for the redirection would occur before the dup2() for the pipe--setting 2 to the terminal and then setting 1 to point to 0 of grep (so you see both). but in reality, the dup2() for the pipe happens first--1 is set to point to grep and then 2 is set to point to 1 (which is going to grep). – Christopher Neylan May 10 '13 at 15:39
  • @ChristopherNeylan I think the difference here is the added complication of opening `/dev/stdin` explicitly, which isn't done until after all the shell-specific stuff, and after `MC-Annotate` itself is executed. – chepner May 10 '13 at 15:49
  • but that's a different point entirely, and that doesn't even add complexity--/dev/stdin is just a link to /proc/self/fd/0, so MC-Annotate is just doing an open() on /proc/$$/fd/0, which will have already been set up by the shell (regardless of redirection or pipelining). my point above was that the shell does its dup2()'s for pipes *before* it handles redirections. – Christopher Neylan May 10 '13 at 16:05
0

From looking at this information about MC-Annotate http://bioinfo.cipf.es/ddufour/doku.php?id=mc-annotate The reason that pipe is not working is because MC-Annotate is not recognizing the cat output from the file as one of type .pbd

Pipe chains commands together the output of the first is used as the input to the next.

The '<' ('less than', 'left arrow', 'left angle bracket') inputs the file into the command.

http://tldp.org/LDP/abs/html/io-redirection.html#IOREDIRECTIONREF2

Schleis
  • 41,516
  • 7
  • 68
  • 87
  • Why would this be the case though? The pdb file is just a text file. If do the following `cat out.pdb > out1.pdb; diff out.pdb out1.pdb`, there is no difference, leading me to believe that the output of the `cat` command is identical to the original file. – juniper- May 10 '13 at 12:54
  • Because the output of cat is not a pdb file. Looking at the documentation for MC-Annotate there is a `-b` option to read binary rather than pdb file. I would guess that the logic for the file type is internal to MC-Annotate. – Schleis May 10 '13 at 13:00
  • I think you're probably right since I don't see similar behavior with other programs. However, I still can't wrap my head around why the file and the piped input should be treated differently when they are exactly the same in terms of what they contain. – juniper- May 10 '13 at 13:05
  • If the command checks that the file has a pdb extension before processing it, the piped input would fail as it won't have the extension. – Schleis May 10 '13 at 13:08
  • That's not the case. It works when using a different file extension. Although perhaps it's trying to seek on the input and failing when it comes from the terminal. – juniper- May 10 '13 at 13:11