5

I was experimenting with some C code for shell implementation and found fgets() returns duplicate lines when after I fork a process, which I could not understand, and I would greatly appreciate any help.

My question is: does forking changes the offset in any open files in the parent process? This seems to happen in my program.

FROM THE ANSWER BELOW @Vadim Ponomarev and my understanding: fgets() is not thread-safe (or strictly speaking, it is, yet forking a process causes the stdin to be initialized in some way, resulting in the change of the shared file offset).

The code goes like this:

int main() {

  char buf[200];
  int r;
  pid_t pid = 0;

  while(getcmd(buf, 200, pid) >= 0) {
    fprintf(stderr, "current pid: %d\n", getpid());
    pid = fork();
    // Without forking the fgets() reads all lines normally
    if(pid == 0)
      exit(0);

    wait(&r);
  }

  return 0;
}

The getcmd() function is just a wrapper:

int
getcmd(char *buf, int nbuf, pid_t pid)
{
  memset(buf, 0, nbuf);
  if (fgets(buf, nbuf, stdin) == NULL) {
    fprintf(stderr, "EOF !!!\n");
    return -1;
  }
  fprintf(stderr, "pid: %d -- getcmd buf ======= --> %s\n", getpid(), buf);
  return 0;
}

I also have an input file temp with some random texts:

line 1
line 2
line 3

After compilation, and I run a.out < temp, the output shows that 6 lines are printed and usually some lines are duplicated. But if I delete the line

pid = fork()
...

then the output becomes normal (just show all the lines one by one, which means fgets() is called 3 times).

Any idea what is going wrong?

Output (this is what got):

pid: 10361 -- getcmd buf ======= --> line1

current pid: 10361
pid: 10361 -- getcmd buf ======= --> line2

current pid: 10361
pid: 10361 -- getcmd buf ======= --> line3

current pid: 10361
pid: 10361 -- getcmd buf ======= --> line2

current pid: 10361
pid: 10361 -- getcmd buf ======= --> line3

current pid: 10361
pid: 10361 -- getcmd buf ======= --> line3

current pid: 10361
EOF !!!

And I expect to see this:

current pid: 10361
pid: 10361 -- getcmd buf ======= --> line1

current pid: 10361
pid: 10361 -- getcmd buf ======= --> line2

current pid: 10361
pid: 10361 -- getcmd buf ======= --> line3

EOF

A compilable version for reference:

#include <stdio.h>
#include <stdlib.h>
#include <wait.h>
#include <zconf.h>
#include <unistd.h>
#include <memory.h>

int
getcmd(char *buf, int nbuf, pid_t pid)
{
  memset(buf, 0, nbuf);
  if (fgets(buf, nbuf, stdin) == NULL) {
    fprintf(stderr, "EOF !!!\n");
    return -1;
  }
  fprintf(stderr, "pid: %d -- getcmd buf ======= --> %s\n", getpid(), buf);
  return 0;
}

int main() {

  char buf[200];
  int r;
  pid_t pid = 0;

  while(getcmd(buf, 200, pid) >= 0) {
    fprintf(stderr, "current pid: %d\n", getpid());
    pid = fork();
    // Without forking the fgets() reads all lines normally
    if(pid == 0)
      exit(0);

    wait(&r);
  }

  return 0;
}

Thanks!

Vitt Volt
  • 337
  • 4
  • 17
  • Can you please edit your question to *show* the actual (and expected) output (in full, copy-pasted as text)? Also please include an actual [Minimal, Complete, and Verifiable Example](http://stackoverflow.com/help/mcve), something that we can easily copy and test ourselves. – Some programmer dude May 17 '17 at 06:09
  • @Someprogrammerdude Hi, I added the outputs. In a word I did not expect to see duplicate lines being read. – Vitt Volt May 17 '17 at 06:13
  • parent and child are sharing the same file descriptor, what do you expect into the fork about the stdin ? – Ôrel May 17 '17 at 06:15
  • You can check here for example http://stackoverflow.com/questions/4277289/are-file-descriptors-shared-when-forking – Ôrel May 17 '17 at 06:19
  • 2
    @Ôrel Thanks for the info. But then in my code I terminate right after child is created. How is that going to change the file offset in parent? – Vitt Volt May 17 '17 at 06:21
  • 1
    This should work as expected. Are you sure you compiled the posted code?It looks like you removed the call to `exit` in your tests. – Jean-Baptiste Yunès May 17 '17 at 06:26
  • I cannot reproduce the behavior you describe with the posted code. I get the expected results. – Michael Burr May 17 '17 at 06:34
  • @MichaelBurr http://ideone.com/MnIbsn – n. m. could be an AI May 17 '17 at 06:48
  • Thanks for posting the link. That was the same problem I had. I'm not that familiar with c, so maybe this is not the way I should do? – Vitt Volt May 17 '17 at 06:51
  • Are you using an online compiler by any chance? – n. m. could be an AI May 17 '17 at 06:53
  • @VittVolt: n.m.'s output is somewhat different from yours. Is yours accurate? Did you actually cut&paste a run? And how did you run the program: from a shell, or from an IDE? – rici May 17 '17 at 06:55
  • Not sure if it helps, but I'm running on my laptop using Ubuntu 16.04 with latest gcc, kernel version 4.8.0-46. – Vitt Volt May 17 '17 at 06:56
  • I have no idea why, but removing `wait` the from parent or adding `fclose(stdin)` to the child fixes the problem on ideone. I cannot reproduce this on my physical computer. – n. m. could be an AI May 17 '17 at 07:22
  • @n.m. Thanks for the suggestion, fclose(stdin) in the child does solve the problem as well. Removing wait() is not applicable in my case since I need to wait for the children to finish. I guess this is just a problem with standard libc. As suggested in the answer below, using read seems to get normal result. – Vitt Volt May 17 '17 at 15:14

2 Answers2

4
  1. it was already mentioned that parent and child are sharing current position for file descriptor 0 (stdin)
  2. seems that libc runtime initialization for streams (stdin, stdout, stderr) contains some stuff changing current stdin position:

    > strace -f ./a.out < temp 2>&1 | less
    ....
    write(2, "pid: 29487 -- getcmd buf ======="..., 45pid: 29487 -- getcmd buf ======= --> line 1
    clone(child_stack=0,flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,child_tidptr=0x7f34940f19d0) = 29488
    Process 29488 attached
    [pid 29487] wait4(-1,  <unfinished ...>
    [pid 29488] lseek(0, -14, SEEK_CUR)     = 7
    [pid 29488] exit_group(0)               = ?
    [pid 29488] +++ exited with 0 +++
    <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 29488
    

please note lseek(0, -14, SEEK_CUR) in child (pid 29488)

  1. as a result, in my environment (openSUSE Leap 42.2, glibc-2.22-4.3.1) the program loops infinitely and there is no EOF at all

  2. changed fgets() to read() in the example

    ....
    if (read(0, buf, nbuf) == 0) {
    ....
    while(getcmd(buf, 7, pid) >= 0) {
    ....
    

and program runs as expected (three lines and EOF)

  1. and run strace -f again - no more lseek() in child!!

  2. Conclusion - seems that stream functions (declared in stdio.h) must be used with great caution in multi-process environment because of many side effects (like in this example)

  • Thanks for the great answer ! – Vitt Volt May 17 '17 at 14:16
  • Actually I also tried using pos = lseek(0, 0, SEEK_CUR) to get the current offset before forking, and lseek(0, pos, SEEK_SET) to reset the file position after wait(). It also turned out to be working. – Vitt Volt May 17 '17 at 15:33
1

I found a solution for using fgets() from this thread that talks about the same problem, tldr:

exit flushes the stdio buffers in the child. ... For more details here is the link corresponding to the POSIX reference, chapter 2.5.1:

http://pubs.opengroup.org/onlinepubs/007904875/functions/xsh_chap02_05.html

The behaviour is therefore undefined, and thus is allowed to change between glibc 2.19 and 2.24.

The fix:

As written the above link, two solutions are possible to fix the code:

if(fork() == 0) { fclose(fd); exit(1); }

or

if(fork() == 0) { _exit(1); }

tsummer2
  • 56
  • 4