I have a C program that forks a child process at some point in a loop. The child process waits for the parent process to finish its job (some numerical calculations). If things go wrong, the parent process aborts and the child process should continue from the state when it was forked and retry the calculation with some modifications. Otherwise, the parents keeps running, and the child process should be killed.
The communication between the parent and child process is through a memory mapped file, which only has 1 byte as a character that indicates the status of the parent process.
The memory map is done like this
char child_flag[]="W";
fp1 = fopen( "child_interface.dat","wb");
// the interface file has two bytes, but only one is meaningful to the program
fwrite(child_flag, 1, sizeof(child_flag), fp1);
fclose(fp1);
printf("child_interface.dat created\n");
if(mmap_child_flag() ==0) {
printf("memory map of parent-child interface successful.\n");
fflush(stdout);
}
The wait loop in the child process is like this
child_pid = fork();
if (child_pid ==0) { /* child process, wait for parent process to finish*/
mmap_child_flag();
while(child_file[0]=='W' ){ //Child waits
usleep(100000);
}
if(child_file[0]=='R'){ // run child process (as a new parent process)
child_file[0]='W';
goto label2;
}
if(child_file[0]=='K'){ //Kill child process
exit(0);
}
}
The problem is that the child process seems to get stuck in the sleep while loop, even when the parent process has set the status to 'K' (checked in the file that is memory mapped). This code has been run on several linux based super computers, and the behavior seems very inconsistent. On some platforms, it can run smoothly, but on some others, it constantly get stuck in the while loop. Sometimes, if I add some statements inside the while loop after the usleep call, it can then run just fine.
However, I'm not sure if the sleep while loop is the root cause of this problem. My guess is that because the process has almost nothing to do except to check a byte in the memory, the system let it sleep all the time and somehow "forget" to let it check the memory. Can such thing happen in the Linux system?
This the function that does the actual mapping
/* Memory map for parent-child processes interface */
int mmap_child_flag()
{
int fd_child;
struct stat st_child;
// open files
if ((fd_child = open("child_interface.dat", O_RDWR)) == -1){
perror("open child_interface.dat");
exit(1);
}
// stat
if (stat("child_interface.dat", &st_child) == -1){
perror("stat of child_interface.dat");
exit(1);
}
// map, child_file is global char array
child_file = mmap(0, st_child.st_size, PROT_WRITE, MAP_SHARED, fd_child, 0);
if (child_file == (char *)(-1)) {
perror("mmap child_interface.dat");
exit(1);
}
return 0;
}