1

I have a parent process that forks and execs a child process, but before that it routes the child's stdout to a file descriptor cout_file through a pipe.

Then the parent process listens through this pipe to the stdout of the child like so:

int status;
size_t buffer_size = 0;
char buffer [1024];

do
{
  LOG(1)
  while( (buffer_size = fread(buffer, sizeof(char), sizeof(char)*1024, cout_file)) !=0)
  {
    LOG(2)
    LOG(std::string(buffer, buffer_size))
  }
}while(waitpid(child_pid,&status, WNOHANG) != -1);

//read one more time
LOG(3)
while( (buffer_size = fread(buffer, sizeof(char), sizeof(char)*1024, cout_file)) !=0)
{
  LOG(4)
  LOG(std::string(buffer, buffer_size))
}

The script (child process) that is executed does this:

sys.stdout.write("BLABLABLA")
sys.stdout.flush()
time.sleep(3) #3 seconds
exit(0)

All of this works as expected 90% of the time, but there's a 10% of the time where the parent process does not see BLABLABLA i.e. does not see the output of neither LOG(2) nor LOG(4)

When if fails I see the following output

1 <---- at time t
1
.
.
1
3 <--- at time t+3 sec

So somehow, the script is writing to its stdout, but the parent process misses it.

The above code, has been working for few years now, and I noticed this behaviour recently, can anyone shed some light on what might be going wrong?

Kam
  • 5,878
  • 10
  • 53
  • 97
  • 2
    Side note: `sizeof(char)` by definition is 1. For any other type, using `sizeof(T)` in both the `size` and `count` arguments of `fread` is wrong. – roeland Nov 13 '15 at 02:46
  • @roeland Can you elaborate please, I don't get what you are saying? Also, do you think this is the cause of the instability? – Kam Nov 13 '15 at 02:48
  • No, it's probably not the cause. But about `fread`: see http://stackoverflow.com/questions/8589425/how-does-fread-really-work , and see the docs: http://en.cppreference.com/w/c/io/fread – roeland Nov 13 '15 at 02:56
  • Can't really see anything. You'll have to post the complete program. – roeland Nov 13 '15 at 03:13
  • Unlike regular files, pipes don't have end-of-file. My guess is that `fread()` is blocked while waiting for 1024 characters if sender doesn't close the pipe after write. It's also possible you have 2 write ends. When you duplicate the write end of fifo to stdout, you must close stdout and close the original copy of the write end. – alvits Nov 13 '15 at 03:32
  • @alvits I have configured `cout_file` to be non blocking so fread shouldn't block and isn't (evidently by the thousands of LOG (1) printouts), also I do close the unneeded end of the pipe, but maybe I do something wrong there I will post that part of code shortly – Kam Nov 13 '15 at 03:38
  • If your pipe is non-blocking and `fread()` attempts to read while there is nothing in the pipe, it will return with value 0. The while loop will terminate because the return value is 0. Result? It skips those 2 blocks. Remember, the write end is not closed until the child exits. – alvits Nov 13 '15 at 03:45
  • @roeland but I'm looping on waitpid... So I will keep retrying to read until script dies – Kam Nov 13 '15 at 03:48
  • Right, but it's a timing issue. The read happens faster than cleanup. Put a delay within the parent before `fread()`ing. Alternatively, try closing stdout right after flushing it. – alvits Nov 13 '15 at 03:50
  • Run `strace -f ` and you will see what is going on. – alvits Nov 13 '15 at 03:53
  • @alvits, I don't really understand your comment, can you elaborate more on why a delay could be helpful please? If I see a series on 1's doesn't that mean that I am looping over fread until I see something, then how would I miss anything? – Kam Nov 13 '15 at 04:00
  • The while condition that checks for child termination will return false and end the loop as soon as child exits. During that time it's possible the kernel hasn't closed the write end of pipe therefore fread() will still return with 0 and then waitpid will terminate the while loop. Adding a delay before reading will give the parent process a chance to fread after the child has exited and the kernel cleaned up and closed the pipe. I suggest you run strace to see exactly what I am trying to say here. – alvits Nov 13 '15 at 04:10
  • @alvits but shouldn't this case be handled by the second read? After the waitpid loop exits? – Kam Nov 13 '15 at 13:36
  • Your output already proved that `fread()` keeps returning `0`. You should really `strace` it to see exactly what is happening under the layers. Remember, when there's no explicit close of streams, pipes and the likes, the kernel doesn't close them until timeout occurs. – alvits Nov 13 '15 at 18:38
  • 1
    There is also the possibility that the issue is caused elsewhere. Until you post most of the codes involved around it, we will only be guessing. – alvits Nov 13 '15 at 18:43

0 Answers0