Trying to re-implement "|" pipe operator in C

Question

I'm trying to re-implement The pipe "|" operator.

The program will be executed as follows:

$ ./exe infile cmd cmd2 cmd3 cmdn outfile.

where at first i'm gonna read from infile process infile's data through the commands and finally pipe it to outfile.

IMPLEMENTAIONS:

My pseudo code looks like the following:

change infile descriptor to stdin.
loop each command.
      pipe();
      fork();
      if (we're in the child process)
         change stdin to read_end of pipe.
         execute command.
      else if (we're in the parent process)
          change stdout to write_end of pipe.
          execute command.
          wait for child process.
change outfile descriptor to stdout.

CODE:

int infile_fd = open(args->infile, O_RDONLY);
    dup2(infile_fd, STDIN_FILENO);
    // how many commands
    while(i < args->len)
    {
       // error handling stuff
       args->cmds[i].cmd = check_exist(args->cmds[i], args);       
    
       if (pipe(args->cmds[i].fd) == -1)
       {
           
           perror("piping failed\n");
           exit(EXIT_FAILURE);
       }
       
       if ((args->cmds[i].pid = fork()) == -1)
        {
            perror("fork failed\n");
            
            exit(EXIT_FAILURE);
        } else {
             // child process
            if (args->cmds[i].pid == 0)
            {
                close(args->cmds[i].fd[WRITE_END]);
                dup2(args->cmds[i].fd[READ_END], STDIN_FILENO);             
                if (execve(args->cmds[i].cmd, args->cmds[i].flags, env) == -1)
                {
                    perror("execve failed\n");
                    exit(EXIT_FAILURE);
                }
                i++; 
            }
            else 
            {
                // parent process
                close(args->cmds[i].fd[READ_END]);
                dup2(args->cmds[i].fd[WRITE_END], STDOUT_FILENO);             
                if (execve(args->cmds[i].cmd, args->cmds[i].flags, env) == -1)
                {
                    perror("execve failed\n");
                    exit(EXIT_FAILURE);
                }
                wait(NULL);
                i++;
                
            }

        }
    }
    int outfile_fd = open(args->outfile, O_WRONLY | O_CREAT | O_TRUNC, 0644);
    // change stdout to outfile
    dup2(outfile_fd, STDOUT_FILENO);

OUTPUT:

output is garbage, commands read data off the stack and start showing env variables.

EXPECTED OUTPUT:

data first being read from "infile" and then passed through commands until the end of the piping channel at "outfile".

It would be silly to ask what am I doing wrong, because I'm probably doing it all wrong, so a better question: How can I do it right?

Note that all the processes in a pipeline _must_ run concurrently. You must start them all up, and only then wait for them to die. The trouble is that if a process creates more data than fits in a pipe buffer (usually 64 KiB these days, but it used to be just 5 KiB), then it will block until some process reads some data. And if the other process isn't running yet, that blockage will be indefinite. The parent process needs to ensure it has no pipes open before it waits for the children to die. — Jonathan Leffler, Apr 18 '22 at 15:42
Both paths in your code after a successful `fork` end up calling `execve`. Definitionally, no code will run in your program after that point. So you couldn't possibly handle multiple commands; at best (assuming no other errors) you'd handle input to cmd1 to cmd2, and now the parent *is* cmd2 and cmd3+ will never be used. — ShadowRanger, Apr 18 '22 at 15:43
@JonathanLeffler: One solution to the "parent must ensure no pipes open" problem may be to always open with `O_CLOEXEC` (with `open`, and using `pipe2` instead of `pipe` to allow passing that flag), and have the parent itself `exec` the final command (thereby ensuring all processes involved `exec` to cause all but their own std handles to close, without needing to worry about which handles need to be closed explicitly). This assumes the child processes require no explicit monitoring of course. — ShadowRanger, Apr 18 '22 at 15:46
@ShadowRanger I'm not sure i've understood why the statement: "parent must ensure no pipes are open", is the problem. but nevertheless a simple solution would be `if (args->cmds[i].fd[0] != -1) ` `close(args->cmds[i].fd[0]); ` `if (args >cmds[i].fd[1] != -1) close(args->cmds[i].fd[1]); ` what do u think? — interesting, Apr 18 '22 at 15:59
@ShadowRanger — using `pipe2()` limits portability, which may or may not be a problem. I think the parent probably should be waiting for the child processes (or, at least, the last child in the pipeline) to die before it exits itself. But I may be putting my 'normal spin' on the requirements. I see that `dup2()` removes the `FD_CLOEXEC` flag (equivalent to `O_CLOEXEC`) if the target file descriptor is not the same as the one being duplicated, so when the children duplicate descriptors to stdin or stdout, they lose the CLOEXEC status (which is good). — Jonathan Leffler, Apr 18 '22 at 16:02
@home — a loop closing unwanted open file descriptors in the parent process is reasonable (and in the children too). — Jonathan Leffler, Apr 18 '22 at 16:03
@JonathanLeffler sorry when u saying a loop i think u mean all previously opened pipes ? right ? please correct me if am wrong ? if that's the case , am i not doing it already, in the child and parent before i open the read_end or write_end i close the opposite end to allow this one way communication. am i wrong ?? — interesting, Apr 18 '22 at 16:15
If you are looking at `… | cmdP | cmdQ | cmdR | …`, and you are working in process Q, then you need to keep the read end of the PQ pipe open as standard input, and the write enf of the QR pipe open as standard output. The write end of PQ and the read end of QR must be closed. If the processes have access to any other pipes (the OP pipe or the RS pipe, for example), those must be closed too. The parent process that coordinates the pipeline must create the pipes in sequence, and must close all its flle descriptors for all the pipes. This is the most common arrangement — there are other ones. — Jonathan Leffler, Apr 18 '22 at 16:19
@home: "parent must ensure no pipes are open" isn't a *problem*, it's just something where it's complex enough that people get it wrong, a lot (in many cases they must be closed in the children too when not used in the children, which makes things uglier). And getting it wrong is a big deal; if a *single* read end descriptor is held open in a *single* process that forgets to close it and blocks on a child using the write end in blocking fashion (e.g. back and forth pipe comms, or a parent waiting on a chain of children as in your case), the whole setup deadlocks. — ShadowRanger, Apr 18 '22 at 16:40
For a working example, see my answer: https://stackoverflow.com/questions/52823093/fd-leak-custom-shell/52825582#52825582 — Craig Estey, Apr 18 '22 at 17:46

Trying to re-implement "|" pipe operator in C

0 Answers0