2

I am trying to do a pipe implementation for a shell. I have implemented in the following way and it works. For eg: If I want to do, ls | grep x | grep y | grep z, I am creating 4 child process from the parent process and working with them. Are there any other ways of doing it ?

For ex: Can I create it using the following flow ? Instead of having 4 children for a single parent process, can 'grep z' be the child of 'grep y' and 'grep y' be the child of 'grep x' and so on ?

I am curious on how the piping functionality is implemented in the bash shell. I tried downloading the source code and understanding it but am lost.

skynet
  • 23
  • 4

1 Answers1

9

This depends on your shell and the programs you invoke.

Theoretically, you could use any variadic tree with N non-root nodes for a combinatorial explosion of possibilities, some of which are:

    shell               shell                  shell                         
    /                  /     \                /    \                         
  grep x             ls       grep y       ls    grep y                      
  /  \                \         \               /     \                      
ls   grep y            grep x   grep z      grep x    grep z                 
        \                                                                    
         grep z                                                  

A POSIX standard shell, though, is required to wait for the last stage to finish before continuing with the next command. Since a process can only wait on a child process, this means that the last stage must be a child of the main shell. A POSIX allowance is to additionally wait for all stages, which is something bash and most other shells do (try sleep 5 | true).

This implies that bash starts all processes as children of itself, and you can verify this with e.g. strace -f -e clone bash -c 'sleep 5 | sleep 5 | sleep 5' or sleep 5 | sleep 5 | sleep 5 & pstree -p $$, if you don't want to study execute_pipeline in execute_cmd.c in the bash source code.

This has the additional benefit of allowing bash's PIPESTATUS array and pipefail option to act on the status of all the stages in the pipeline, which you can't get if all stages aren't direct child processes.

Another consideration is that programs rarely handle children they didn't expect. At best, you will get a zombie process, at worst it will interfere with the correctness of the process. This means that ls should never be a direct child of grep or vice versa. However, you can make it a grandchild through a double fork, so that init can take responsibility for it instead.

So yes, you can use any configuration you want, but in practice (like in bash, dash, ash and zsh), it'll tend to be flat.

Community
  • 1
  • 1
that other guy
  • 116,971
  • 11
  • 170
  • 194
  • Wow. Thanks a lot, this was exactly what I Was looking for. I am going to take sometime to process all the info you've given me and try to understand things. Thank You :) – skynet Jul 21 '15 at 23:27