Is output read from popen()ed FILE* complete before pclose()?

Question

pclose()'s man page says:

The pclose() function waits for the associated process to terminate and returns the exit status of the command as returned by wait4(2).

I feel like this means if the associated FILE* created by popen() was opened with type "r" in order to read the command's output, then you're not really sure the output has completed until after the call to pclose(). But after pclose(), the closed FILE* must surely be invalid, so how can you ever be certain you've read the entire output of command?

To illustrate my question by example, consider the following code:

// main.cpp

#include <iostream>
#include <cstdio>
#include <cerrno>
#include <cstring>
#include <sys/types.h>
#include <sys/wait.h>

int main( int argc, char* argv[] )
{
  FILE* fp = popen( "someExecutableThatTakesALongTime", "r" );
  if ( ! fp )
  {
    std::cout << "popen failed: " << errno << " " << strerror( errno )
              << std::endl;
    return 1;
  }

  char buf[512] = { 0 };
  fread( buf, sizeof buf, 1, fp );
  std::cout << buf << std::endl;

  // If we're only certain the output-producing process has terminated after the
  // following pclose(), how do we know the content retrieved above with fread()
  // is complete?
  int r = pclose( fp );

  // But if we wait until after the above pclose(), fp is invalid, so
  // there's nowhere from which we could retrieve the command's output anymore,
  // right?

  std::cout << "exit status: " << WEXITSTATUS( r ) << std::endl;

  return 0;
}

My questions, as inline above: if we're only certain the output-producing child process has terminated after the pclose(), how do we know the content retrieved with the fread() is complete? But if we wait until after the pclose(), fp is invalid, so there's nowhere from which we could retrieve the command's output anymore, right?

This feels like a chicken-and-egg problem, but I've seen code similar to the above all over, so I'm probably misunderstanding something. I'm grateful for an explanation on this.

Read until the read returns an error. If you try to close and the process is blocked trying to send output to a full pipe, you will hang. — stark, Sep 06 '18 at 23:45

n. m. could be an AI · Accepted Answer · 2018-09-07T16:16:08.037

3

TL;DR executive summary: how do we know the content retrieved with the fread() is complete? — we've got an EOF.

You get an EOF when the child process closes its end of the pipe. This can happen when it calls close explicitly or exits. Nothing can come out of your end of the pipe after that. After getting an EOF you don't know whether the process has terminated, but you do know for sure that it will never write anything to the pipe.

By calling pclose you close your end of the pipe and wait for termination of the child. When pclose returns, you know that the child has terminated.

If you call pclose without getting an EOF, and the child tries to write stuff to its end of the pipe, it will fail (in fact it wil get a SIGPIPE and probably die).

There is absolutely no room for any chicken-and-egg situation here.

edited Sep 07 '18 at 16:16

answered Sep 07 '18 at 10:57

n. m. could be an AI

112,515
14
128
243

1

I guess the OP is confusing process lifetime with "program" lifetime, particularly as it pertains to stream open/close, which is sort of fair as it's not necessarily intuitive if you're a C++ programmer rather than a Linux expert. ("But `std::cout` is open until the end of my program!") But indeed this is all deterministic and safe, just confusing. – Lightness Races in Orbit Sep 07 '18 at 11:03
@LightnessRacesinOrbit - I was also confused over what EOF really is. I was thinking: "the child process outputs _some stuff_, then maybe `sleep`s for a while; meanwhile the `fread` in the parent process reads _some stuff_, and it must end/unblock there since that's all that's currently readable; later maybe the child process decides to output _other stuff_, which the `fread` will now miss." Anyhow, that's all cleared up now; I learned a little about EOF, and I'm better for having learned something new. :) – StoneThrow Sep 07 '18 at 15:59
@StoneThrow I mean, that scenario is certainly possible, if the child doesn't keep reading all the way to EOF before closing its end of the pipe. In fact it's a common programming mistake! – Lightness Races in Orbit Sep 07 '18 at 16:11
@LightnessRacesinOrbit - agreed: based on my newfound understanding of the subject. For the sake of simplifying the premise of this question, I just made the assumption `fread`'s supplied buffer was large enough to read all the output of the `popen`ed `command`. But this also explains why some examples of using `popen` use `fgets` in a loop to read from the opened file. – StoneThrow Sep 07 '18 at 17:03
@StoneThrow Exactly. – Lightness Races in Orbit Sep 10 '18 at 10:25

score 1 · Answer 2 · answered Sep 07 '18 at 01:08

I learned a couple things while researching this issue further, which I think answer my question:

Essentially: yes it is safe to fread from the FILE* returned by popen prior to pclose. Assuming the buffer given to fread is large enough, you will not "miss" output generated by the command given to popen.

Going back and carefully considering what fread does: it effectively blocks until (size * nmemb) bytes have been read or end-of-file (or error) is encountered.

Thanks to C - pipe without using popen, I understand better what popen does under the hood: it does a dup2 to redirect its stdout to the write-end of the pipe it uses. Importantly: it performs some form of exec to execute the specified command in the forked process, and after this child process terminates, its open file descriptors, including 1 (stdout) are closed. I.e. termination of the specified command is the condition by which the child process' stdout is closed.

Next, I went back and thought more carefully about what EOF really was in this context. At first, I was under the loosey-goosey and mistaken impression that "fread tries to read from a FILE* as fast as it can and returns/unblocks after the last byte is read". That's not quite true: as noted above: fread will read/block until its target number of bytes is read or EOF or error are encountered. The FILE* returned by popen comes from a fdopen of the read-end of the pipe used by popen, so its EOF occurs when the child process' stdout - which was dup2ed with the write-end of the pipe - is closed.

So, in the end what we have is: popen creating a pipe whose write end gets the output of a child process running the specified command, and whose read end if fdopened to a FILE* passed to fread. (Assuming fread's buffer is big enough), fread will block until EOF occurs, which corresponds to closure of the write end of popen's pipe resulting from termination of the executing command. I.e. because fread is blocking until EOF is encountered, and EOF occurs after command - running in popen's child process - terminates, it's safe to use fread (with a sufficiently large buffer) to capture the complete output of the command given to popen.

Grateful if anyone can verify my inferences and conclusions.

score 0 · Answer 3 · answered Sep 06 '18 at 20:42

0

Read the documentation for popen more carefully:

The pclose() function shall close a stream that was opened by popen(), wait for the command to terminate, and return the termination status of the process that was running the command language interpreter.

It blocks and waits.

answered Sep 06 '18 at 20:42

tadman

208,517
23
234
262

I understand that _pclose_ blocks and waits. And the `popen` documentation says "After popen(), both the parent and the child process shall be capable of executing independently before either terminates." I.e. you don't know the child process has terminated until the blocking `pclose`. In other words: no matter how many times you read from `fp` before closing the `FILE*`, how can you know the child process is not producing more output? It seems like you cannot know until after `pclose()` at which time you can't read from `fp` anymore. – StoneThrow Sep 06 '18 at 20:54
1

The `popen()` family of functions isn't as capable as some of the other ones, so if you need more control you'll need to use something more low-level where you can monitor for signals like the remote being closed. See things like this [example using `pipe`](https://stackoverflow.com/questions/15196784/how-to-wait-till-data-is-written-on-the-other-end-of-pipe) where low-level file descriptors are useful for getting this sort of information. `FILE*` is a wrapper. – tadman Sep 06 '18 at 20:56
Was I correct, then, to say that there is a potential chicken-and-egg problem with the `popen()` family of functions as used above? I get what you're saying about needing something more low level, possibly using file descriptors, because to address my question, it seems that you'd want to know _that_ the child process terminated before invalidating objects containing the child process' output - true? I.e. you'd kind of want to split the functionality encapsulated by `pclose()`, right? – StoneThrow Sep 06 '18 at 21:12
2

It's not so much a chicken-and-egg problem as in if you're using `popen()` you're expressing that you don't care about the details and all you care about is grabbing the output if and when it's done. A `FILE*` structure is an abstraction that doesn't give you the details on the subprocess PID. A `pipe`/`fork`/`exec` approach does, but it's way more work. – tadman Sep 06 '18 at 21:25

tunglt · Answer 4 · 2018-09-07T10:43:44.220

popen() is just a shortcut for series of fork, dup2, execv, fdopen, etc. It will give us access to child STDOUT, STDIN via files stream operation with ease.

After popen(), both the parent and the child process executed independently. pclose() is not a 'kill' function, its just wait for the child process to terminate. Since it's a blocking function, the output data generated during pclose() executed could be lost.

To avoid this data lost, we will call pclose() only when we know the child process was already terminated: a fgets() call will return NULL or fread() return from blocking, the shared stream reach the end and EOF() will return true.

Here is an example of using popen() with fread(). This function return -1 if the executing process is failed, 0 if Ok. The child output data is return in szResult.

int exec_command( const char * szCmd, std::string & szResult ){

    printf("Execute commande : [%s]\n", szCmd );

    FILE * pFile = popen( szCmd, "r");
    if(!pFile){
            printf("Execute commande : [%s] FAILED !\n", szCmd );
            return -1;
    }

    char buf[256];

    //check if the output stream is ended.
    while( !feof(pFile) ){

        //try to read 255 bytes from the stream, this operation is BLOCKING ...
        int nRead = fread(buf, 1, 255, pFile);

        //there are something or nothing to read because the stream is closed or the program catch an error signal
        if( nRead > 0 ){
            buf[nRead] = '\0';
            szResult += buf;
        }
    }

    //the child process is already terminated. Clean it up or we have an other zoombie in the process table.
    pclose(pFile); 

    printf("Exec command [%s] return : \n[%s]\n",  szCmd, szResult.c_str() );
    return 0;
}

Note that, all files operation on the return stream work on BLOCKING mode, the stream is open without O_NONBLOCK flags. The fread() can be blocked forever when the child process hang and nerver terminated, so use popen() only with trusted program.

To take more controls on child process and avoid the file blockings operation, we should use fork/vfork/execlv, etc. by ourself, modify the pipes opened attribut with O_NONBLOCK flags, use poll() or select() from time to time to determine if there are some data then use read() function to read from the pipe.

Use waitpid() with WNOHANG periodically to see if the child process was terminated.

Is output read from popen()ed FILE* complete before pclose()?

4 Answers4