When would piping work - does application have to adhere to some standard format? What is stdin and stdout in Unix?

Question

I am using a program that allows me to do

echo "Something" | app outputilfe

But a similar program doesnt do that (and its a bash script that runs Java -jar internally). Both works with

app input output

This leads to me this question . And why some programs do it and some don't ?

I am basically trying to understand in a larger sense how does programs inter-operate so fluently in *nix - The idea behind it- what is stdin and stdout in a simple layman terms and

A simple way of writing a program that takes an input file and writes an output file is:

Write a code in such a manor that the first 2 positional arguments get interpreted as input and output strings where input should a file that is available in the file system and output is a string that is where its going to write back the binary data .

But this is not how it is . It seems I can stream it . Thats a real paradigm shift for me. I believe its the File Descriptor abstraction that makes it possible? That is you normally write code to expect a FD as positional arguments and not the real file strings ? Which in turn means the output file gets opened and the fd is sent to the program once I execute the command in bash ?

It can read from Terminal and give the display to screen or a application . What makes this possible ? I think there is some concept of file descriptors that I am missing here ? Does applications 'talk' in terms of File Descriptors and not file name as strings? - In Unix everything is a file and that means FD is used ?

Few other related reads :

http://en.wikipedia.org/wiki/Pipeline_(Unix)

What is a simple explanation for how pipes work in BASH?

confused about stdin, stdout and stderr?

possible duplicate of [confused about stdin, stdout and stderr?](http://stackoverflow.com/questions/3385201/confused-about-stdin-stdout-and-stderr) — lurker, Dec 21 '13 at 14:18
I think its related answer but perhaps instaed of talking in terms of stdin directly , I am asking about how apps inter-operate . So maybe it can be another question ? I read that first . — Nishant, Dec 21 '13 at 15:38

score 1 · Accepted Answer · answered Dec 21 '13 at 14:57

Here's a very non-technical description of a relatively technical topic:

A file descriptor, in Unix parlance, is a small number that identifies a given file or file-like thingy. So let's talk about file-like-thingies in the Unix sense.

What's a Unix file-like-thingy? It's something that you can read from and/or write to. So standard files that live in a directory on your hard disk certainly can qualify as files. So can your terminal session – you can type into it, so it can be read, and you can read output printed on it. So can, for that matter, network sockets. So can (and we'll talk about this more) pipes.

In many cases, an application will read its data from one (or more) file descriptors, and write its results to one (or more) file descriptors. From the point of view of the core code of the application, it doesn't really care which file descriptors its using, or what they're "hooked up" to. (Caveat: Different file descriptors can be hooked up to file-like-thingies with different capabilities, like read-only-ness; I'm ignoring this deliberately for now.) So I might have a trivial program which looks like (ignoring error checking):

void zcrew_up_zpelling(int in_fd, int out_fd) {
    char c;
    ssize_t
    while(read(in_fd, &c, 1)) {
        if (c == 's') c = 'z';
        write(out_fd, &c, 1));
    }
}

Don't worry too much about what this code does (please!); instead, just notice that it's copying-and-modifying from one file descriptor to another.

So, what file descriptors are actually used here? Well, that's up to the code that calls zcrew_up_zpelling(). There are, however, some vague conventions. Many programs that need a single source of input default to using stdin as the file descriptor they'll read from; many programs that need a single source of output default to using stdout as the file descriptor they'll write to. Many of these programs provide ways to use a different file descriptor instead, often one hooked up to a named file.

Let's write a program like this:

int main(int argc, char **argv) {
    int in_fd = 0;  // Descriptor of standard input
    int out_fd = 1;  // Descriptor of standard output
    if (argc >= 2) in_fd = open(argv[1], O_RDONLY);
    if (argc >= 3) out_fd = open(argv[2], O_WRONLY);
    zcrew_up_zpelling(in_fd, out_fd);
    return 0;
}

So, let's run our program:

./our_program

Hmm, it's waiting for input. We didn't pass any arguments, so it's just using stdin and stdout. What if we type "Using stdin and stdout"?

Uzing ztdin and ztdout

Interesting. Let's try something different. First, we create a file containing "Hello worlds" named, let's say, hello.txt.

./our_program hello.txt

What do we get?

Hello worldz

And one more run:

./out_program hello.txt output.txt

Out program returns immediately, but creates a file called output.text containing... our output!

Deep breath. At this point, I'm hoping that I've successfully explained how a program is able to have behavior independent of the type of file-like-thingy hooked up to a file descriptor, and also to choose what file-like-thingy gets hooked up.

What about that pipe thing I mentioned? What about streaming? Why does it work when I say:

echo Tessting | ./our_program | grep -o z | wc -l

Well, each of these programs follows some form of the conventions above. our_program, as we know, by default reads from stdin and writes to stdout. grep does the same thing. wc by default reads from stdin, but by default writes to stdout -- it likes to live at the end of pipelines. And echo doesn't read from a file descriptor at all (it just reads arguments, like we did in main()), but writes to stdout, so likes to live at the front of streams.

How does this all work? Well, to get much deeper we have to talk about the shell. The shell is the program that starts other command line programs, and it gets to choose what file descriptors are already hooked up to when a program starts. Those magic numbers of 0 and 1 for stdin and stdout we used earlier? That's a Unix convention, and the shell hooks up a file-like-thingy to each of those file descriptors before starting your program. When the shell sees you asking for a pipeline by entering a command with | characters, it hooks the stdout of one program directly into the stdin of the next program, using a file-like-thingy called a pipe. A file-like-thingy pipe, just like a plumbing pipe, takes whatever is put in one end and puts it out the other.

So, we've combined three things:

Code that deals with file descriptors, without worrying about what they're hooked to
Conventions for default file descriptors to use for normal tasks
The shell's ability to set up a program's file descriptors to "pipe" to other programs'

Together, these give us the ability to write programs that "play nice" with streaming and pipelines, without each program having to understand where it sits in the pipeline and what's happening around it.

This is as perfect an explanation I would have wanted ! So basically you have an API that takes FD's and defaults to 0 1 if nothing is given as input . Pipe uses that convention (0 and 1 and works on it) to create a nice plugin architecture that abstracts input and output from the program itself . Thats helpful . — Nishant, Dec 21 '13 at 16:49
As an aside some programs like VIM have a convention vim - where it opens from stdin and - means stdin to vim and many unix like apps . Some have different convention to do the same thing and the program has to have this inter operability to work as expected . — Nishant, Dec 21 '13 at 17:15

When would piping work - does application have to adhere to some standard format? What is stdin and stdout in Unix?

1 Answers1