0

As a practice for multiprocessor programming in C, I have been trying to make a program that can use files as a way of communication between processes. This part of the program is supposed to use a child process to read the contents of a file and copy them into a temporal file, and then copying them from the temporal file into an output file (preferably doing this line by line). The problem is that after reading all the lines of the file without seeming to present any problems, the loop just goes back to the 2nd line and starts again, and again, and again... for some reason.

It works fine when I use a single process but that is not the goal I'm trying to achieve. Other things that seemed to help were replacing the fprintf() calls with write() calls (got rid of lines getting repeated along the way in the output file). I want to think that the problem has something to do with fgets not returning NULL when I think it should, but I have no idea why that could happen. The only lesson I think I'm getting from this at this point is to never use fork() inside loops, since that was pretty much the one thing that my problems and the solutions I found didn't have in common lol. Here are some details of the code I'm using:

Libraries:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>
#include <string.h>

The program in question (it is in main() after some assertions):

  //opening files with pathnames given as input, creates or overwrites output file
  FILE *Nombres = fopen(argv[1], "r");
  FILE *Apellidos = fopen(argv[2], "r");
  FILE *Output = fopen(argv[3], "w");

  //temp file with a fixed name so i can test around
  char *tempname = "aaa.txt";
  FILE *Tmp = fopen(tempname, "w+");
  char linea[MAXIMA_LONGITUD_LINEA];
  pid_t hijoid;

  while (fgets(linea, MAXIMA_LONGITUD_LINEA, Nombres) != NULL) {
    printf("%s\n", linea);

    Tmp = fopen(tempname, "w+"); //clear tempfile contents
    hijoid = fork();

    if (hijoid == 0) {
      write(fileno(Tmp), linea, strlen(linea));
      exit(0);
    } else {
      waitpid(hijoid, NULL, 0);
      rewind(Tmp);

      if (fgets(linea, MAXIMA_LONGITUD_LINEA, Tmp) != NULL) {
        write(fileno(Output), linea, strlen(linea));
      } else {
        printf("Line couldn't be read.\n");
      }

    }
  }
}

Edit: This is a college assignment that was intended to measure the time difference between using pipes and signals vs. using neither of them, now seeing that there was no progress with this method, and its not the way it should be done anyway, I just went ahead and used pipes instead without many issues. Wouldn't mind sharing that code but I think that's already kind of out of the topic.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
Neon
  • 1
  • 2
  • When reading or writing files, the data is buffered in memory. So what one process sees in a file isn't necessarily what another process sees in that same file. – user3386109 Feb 25 '23 at 23:24
  • Note that with fgets you can only read 1 less than the buffer size as it always adds a 0 byte after the characters read. – stark Feb 25 '23 at 23:31
  • IMO, the best approach is to stop using files, and learn how to use pipes, because that's what pipes are for. – user3386109 Feb 25 '23 at 23:31
  • 1
    When writing to a file, use `fflush()` to flush the buffer when you need the reader to see it. – Barmar Feb 26 '23 at 00:16
  • The following [LINK](https://stackoverflow.com/questions/50110992/why-does-forking-my-process-cause-the-file-to-be-read-infinitely) Describes your problem,and a solution to it in the check-marked answer... specifically, in the child process, before doing anything else insert: fclose(Nombres); The well documented answer is five years old, but that solution appears to still work. – TonyB Feb 27 '23 at 09:07

1 Answers1

2

Other things that seemed to help were replacing the fprintf() calls with write() calls (got rid of lines getting repeated along the way in the output file).

The typical implementations of read() and write() do not do buffering1 in the user space, whereas the standard streams are buffered.

C17 § 7.21.3p7:

the standard input and standard output streams are fully buffered if and only if the stream can be determined not to refer to an interactive device.

C17 § 5.1.2.3p7:

What constitutes an interactive device is implementation-defined.

With fork(), the entire virtual address of the parent is replicated in the child. If, before calling fork(), the parent had data that was buffered, both the parent and the child will have the same buffered data.

A call to fflush() with a NULL argument would empty all streams for which the behavior is defined by the C standard (or more). Another option is to disable buffering with setvbuf()/setbuf().

The goal of the program is unclear to me. The conventional method of interprocess communication in UNIX is the pipe(), not files. See UNIX Network Programming: Interprocess Communication.


Footnote:

1

The term unbuffered means that each read or write invokes a system call in the kernel. These unbuffered I/O functions are not part of ISO C, but are part of POSIX.1 and the Single UNIX Specification.

Harith
  • 4,663
  • 1
  • 5
  • 20
  • 1
    I've done it with pipes now and I can tell they have their advantages. They gave us 3 options to implement this and naive me thought starting with the first one and building up from there was going to be the easier choice, maybe because pipes were the newer concept in the course. – Neon Feb 26 '23 at 07:01
  • UNP is a mammoth of a book, this tones it down to just the basics: https://beej.us/guide/bgipc/ – Harith Feb 26 '23 at 12:36