0

I have the following C code:

#include <stdio.h>
#include <unistd.h>

int main()
{
    int i, pid = 0;
    for (i = 0; i < 3; i++)
    {
        fork();
        pid = getpid();
        printf("i=%d pid=%d\n", i, pid);
    }
    return 1;
}

Which is supposed to create a total of 7 new processes after all the iterations in the loop. Analyzing it you can see that 14 lines should be printed before all the processes finish, and that is exactly what you see when you execute it from the command line.

However, when you redirect the output to another file ./main > output.txt; cat output.txt, you get a completely different situation. In total, 24 lines are always printed and some of them are repeated for the same i and pid values, and the amount of repetition seems consistent. I'm attaching a screenshot for clarification here Execution example. The system that I'm using is Ubuntu 20.04.3 in a VirtualBox VM.

I really don't understand why that is happening, I'm guessing it has something to do with race conditions on the output buffer or some other conflict when multiple processes are writing to the file, but that doesn't explain to me why it doesn't happen on the terminal. Can anybody explain this odd behaviour? Thanks!

derivada
  • 13
  • 1

1 Answers1

2

When the standard output is a terminal, the stream is typically line buffered. The C standard requires it not be fully buffered, meaning it must be line buffered or unbuffered; C 2018 7.21.3 6 says:

… As initially opened, … the standard input and standard output streams are fully buffered if and only if the stream can be determined not to refer to an interactive device.

When the program executes printf("i=%d pid=%d\n", i, pid);, the output is immediately sent to the terminal, either because the stream is line buffered and the new-line character causes the output to be sent or because the stream is unbuffered and the output is always sent in each printf. Then, when the program forks, there is no pending output, because it has already been sent to the terminal. Each forked instance of the program prints only its own output.

When the standard output is redirected to a file, the stream is fully buffered. Then, when the program executes printf("i=%d pid=%d\n", i, pid);, the data is held in a buffer inside the program. It is not sent to the terminal immediately. (It will be sent when the buffer is full or when a flush is requested, which occurs automatically at normal program termination.) When the program forks, the buffer is copied along with the rest of the program state. Each forked instance of the program accumulates output in the buffer.

When each forked instance of the program exits, pending data in its buffers are flushed. Thus includes both data added by that particular instance and data that was put into the buffer in parent processes and copied by the fork. Thus multiple copies of data are printed.

To resolve this, execute fflush(stdout); immediately before fork();. This flushes the buffer before forking. Alternately, request that the stream be line-buffered by executing setvbuf(stdout, NULL, _IOLBF, 0); at the start of main.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312