0

I have a C program that forks a child process at some point in a loop. The child process waits for the parent process to finish its job (some numerical calculations). If things go wrong, the parent process aborts and the child process should continue from the state when it was forked and retry the calculation with some modifications. Otherwise, the parents keeps running, and the child process should be killed.

The communication between the parent and child process is through a memory mapped file, which only has 1 byte as a character that indicates the status of the parent process.

The memory map is done like this

    char child_flag[]="W";
    
    fp1 = fopen( "child_interface.dat","wb");
    // the interface file has two bytes, but only one is meaningful to the program
    fwrite(child_flag, 1, sizeof(child_flag), fp1); 
    fclose(fp1);
    printf("child_interface.dat created\n");
    
    if(mmap_child_flag() ==0) {
        printf("memory map of parent-child interface successful.\n");
        fflush(stdout);
    }

The wait loop in the child process is like this

child_pid = fork();                     
if (child_pid ==0) { /* child process, wait for parent process to finish*/

    mmap_child_flag();

    while(child_file[0]=='W' ){  //Child waits
        usleep(100000);
    }
    if(child_file[0]=='R'){ // run child process (as a new parent process)
        child_file[0]='W';
        goto label2;
    }
    if(child_file[0]=='K'){ //Kill child process
        exit(0);
    }
}

The problem is that the child process seems to get stuck in the sleep while loop, even when the parent process has set the status to 'K' (checked in the file that is memory mapped). This code has been run on several linux based super computers, and the behavior seems very inconsistent. On some platforms, it can run smoothly, but on some others, it constantly get stuck in the while loop. Sometimes, if I add some statements inside the while loop after the usleep call, it can then run just fine.

However, I'm not sure if the sleep while loop is the root cause of this problem. My guess is that because the process has almost nothing to do except to check a byte in the memory, the system let it sleep all the time and somehow "forget" to let it check the memory. Can such thing happen in the Linux system?

This the function that does the actual mapping

/* Memory map for parent-child processes interface */
int mmap_child_flag()
{
    int fd_child;    
    struct stat st_child; 
    
    // open files
    if ((fd_child = open("child_interface.dat", O_RDWR)) == -1){
        perror("open child_interface.dat");
        exit(1);
    }
    // stat
    if (stat("child_interface.dat", &st_child) == -1){
        perror("stat of child_interface.dat");
        exit(1);
    }
    // map, child_file is global char array
    child_file = mmap(0, st_child.st_size, PROT_WRITE, MAP_SHARED, fd_child, 0);
    if (child_file == (char *)(-1)) {
        perror("mmap child_interface.dat");
        exit(1);
    }
    return 0;
}
  • as soon as a child process is spawned, it gets its own copy of the `child_file` array. Whatever the parent is doing to its own copy is not reflected in the child's copy. You need to look into inter-process communication techniques to do communication. or go to threads. – Serge Jun 18 '21 at 02:38
  • How is `child_file` set in `main`? – SGeorgiades Jun 18 '21 at 02:38
  • @SGeorgiades I've added this information in the description. I also forgot to mention that the child process also maps the file after it is forked. – Ballistic Grin Jun 18 '21 at 02:59
  • @Serge I forgot the mention that the child process also redo the mmap of the interface file after it is forked. does this change the behavior? – Ballistic Grin Jun 18 '21 at 03:00
  • @user556071 Assuming that you use one of the `MAP_SHARED` flags in mapping, it could work. You might also need to use `msync()`. – Serge Jun 18 '21 at 12:54
  • @Serge Yes I was using `MAP_SHARED` flag. The (also implied by the code's behavior) need to use `msync()` is where I don't understand. Shared mapping is supposed to make the update visible to all process mapping the same file without reference to the actual file on the disk. – Ballistic Grin Jun 18 '21 at 13:40
  • I think you're going to need to provide an MCVE ([Minimal, Complete, Verifiable Example](https://stackoverflow.com/help/mcve) — or MRE or whatever name SO now uses) or an SSCCE ([Short, Self-Contained, Correct Example](http://sscce.org/)) — the same idea by a different name. The declaration of `child_flag` in the first fragment should be unrelated to the declaration of `child_file` used in the loop, which might be a sufficient explanation of the trouble. But assuming it isn't, we need to see a minimized version of your code that reproduces the problem. – Jonathan Leffler Jun 18 '21 at 13:40

2 Answers2

1

A shared memory region is good for sharing a lot of data, but it is a bad way to communicate between processes. The reason is that you can't get a notification that something has been changed, nor do you get a notification if the other user of the shared memory died.

To communicate between two processes, use pipe() if you need to create a one-way communication channel, or if you need bidirectional communication, use socketpair(). You can use poll() to wait for the other side to send some data. You will also get notified if the process on the other side terminated.

You were using a loop like this:

while(child_file[0]=='W' ){  //Child waits
    usleep(100000);
}

This is bad, since you are wasting on average 50 ms of time that you could have spent doing something useful. Apart from that, there is also the problem that both the compiler and the CPU can sometimes change the order in which things are written to memory. If you have more data in child_file than just the flag at the start, then this might be an issue, unless you use atomics or explicit barriers.

G. Sliepen
  • 7,637
  • 1
  • 15
  • 31
1

The problem is that the child process seems to get stuck in the sleep while loop, even when the parent process has set the status to 'K' (checked in the file that is memory mapped).

There are several odd things about your program, with one of them being that you are using shared memory for this task at all. See below for a better approach.

Issues with the current approach

As to the question as it stands, however, you have a synchronization problem. The contents of the mapped memory are being changed outside the scope of the child process, but you've given it no reason to suspect that that might be the case. The compiler can therefore assume that if the wait loop condition is satisfied when it is first evaluated, then it will be satisfied on every subsequent evaluation, too.

For a more complicated interaction, you might need to set up a process-shared mutex or similar to guard access to the shared memory, but for this, it would probably be sufficient to declare child_file as a pointer to volatile char.

A better approach

You want the child to wait for a one- or maybe two-byte instruction from the parent. You presently do this by polling the contents of a shared memory segment, but that's complex to set up and use, as you discovered. It would be a lot easier to use a pipe to convey the needed information from parent to child:

  • setup: Declare an array. Call pipe().
  • child use: The child performs a blocking read() on the pipe.
  • parent use: write() the message to the pipe when ready, then close it. Or just close it.

Note that the pipe itself then provides adequate synchronization, and that there is no need for a wait loop. Note also that the child can detect the case that the parent dies without sending any message, which your shared memory approach does not support.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • Mostly correct, except the compiler cannot assume the loop condition is satisfied on subsequent evaluations if there is any function call in the loop that might have side effects, such as any I/O or `usleep()`. – G. Sliepen Jun 18 '21 at 14:08
  • @G.Sliepen, compilers can do all sorts of things, including some surprising ones, and occasionally including even wrong ones. But one of the things a compiler might do is use knowledge of the behavior of library functions, whether built in or recorded in some way in function declarations (e.g. GNU C function attributes), to ascertain that a given function *does not* have side effects. – John Bollinger Jun 18 '21 at 14:15
  • @JohnBollinger Thank you for the answer! Adding the `volatile` keyword for `child_file` does the trick. I tested in a smaller piece and the synchronization seems to be much faster. So this is the compiler's "fault" that the value in memory is not actually checked? I read some [explanation about the `volatile` keyword](https://stackoverflow.com/questions/246127/why-is-volatile-needed-in-c), and found out that it is indeed designed for such situations. My other question is, would using `pipe()` be faster than the `mmap()` method (given that `volatile` is used)? – Ballistic Grin Jun 19 '21 at 00:32
  • @user556071, in addition to being simpler, cleaner, and able to hande some situations that `mmap()` can't, it is possible, yes, that the pipe solution would be faster, too. This is because in the mmap case, the child will, on average, be halfway through a `usleep()` when the parent updates shared memory, and it will complete the sleep before checking for a change. In the pipe case, the child can respond nearly immediately. – John Bollinger Jun 19 '21 at 02:16