0

I have a program that executes the same code sequentially and in parallel simultaneously between various checkpoints throughout the program. To do this, the initial program forks off a child process which runs sequentially whilst the parent process runs in parallel. Whichever process reaches the next checkpoint first then kills off the other process and this repeats until the end of execution thereby executing as fast as possible (ignoring fork and copy overheads).

I have implemented this and everything works fine when only sequential execution is the fastest, or only parallel execution is the fastest, but when the first few sections execute fastest in parallel followed by the next few executing fastest sequentially the program stalls with both processes sleeping. I cannot see what could be causing this. Have I hit some limit imposed on forking processes? Can killing a parent process affect the execution of a forked child process or vice versa?

The code below gives my function checkPoint() which is executed at every checkpoint, with some custom code handling the end of the final section. Parallel code is implemented using OpenMP with sequential code never encountering an OpenMP pragma statement.

pid_t parent = 0;
pid_t child = 0;

void checkPoint() {

    if (parent == 0) {

        // First Time
        parent = getpid();

    } else if (child == 0) {

        // Child Process
        kill(parent, SIGKILL);
        parent = getpid();

    } else {

        // Parent Process
        kill(child, SIGKILL);

    }

    child = fork();

}

Thanks,

Dan

divot
  • 21
  • 3
  • What do you mean "parent thread" ?? This child you forked is a process and is never a thread , there is a very distinct difference between processes and threads !! Have a look here http://stackoverflow.com/questions/200469/what-is-the-difference-between-a-process-and-a-thread Clarrifying other things , have you used waitpid() ? – Barath Ravikumar Mar 02 '13 at 08:20
  • If you're not calling `wait()` to clean up the zombies, you may be running into `RLIMIT_NPROC`. – Barmar Mar 02 '13 at 08:29
  • I meant process not thread, updated accordingly. I'm not calling wait or waitpid, and adding them in after the kill statements does not change anything. If I understand correctly, wait can only be called by the parent waiting for the children to exit but in my case the parent is being/has been killed. Waitpid I can see possibly helping if there were leftover zombie processes but there are none according to top, just a child process (i.e. not the original process) and its direct child process, both sleeping. – divot Mar 02 '13 at 08:43
  • ironically, you have a [Race Condition](http://en.wikipedia.org/wiki/Race_condition) – dudeofea Mar 02 '13 at 08:51
  • also, does manually killing one of the processes in your terminal fix the problem? – dudeofea Mar 02 '13 at 08:53
  • I'm not seeing what the race condition is. Is it that my forked process is being killed before it has had time to properly start or something along those lines? – divot Mar 02 '13 at 08:56
  • Manually killing the sleeping processes does not help. If I kill the child I end up with a zombie and the parent. If I kill the parent I end up with just the child. – divot Mar 02 '13 at 09:00
  • We are going to need a lot more of your code. Start ripping it don to bare bones. Also of need to understand that both processes run simultaneously and each on could stop operating and pause at any possible instruction, so you would need mutually exclusive locks on checkPoint. – Myforwik Mar 02 '13 at 10:56

1 Answers1

0

It turns out that this is a bug that occurs when using gcc, OpenMP and forks.

Taken from http://bisqwit.iki.fi/story/howto/openmp/#OpenmpAndFork

If your program intends to become a background process using daemonize() or other similar means, you must not use the OpenMP features before the fork. After OpenMP features are utilized, a fork is only allowed if the child process does not use OpenMP features, or it does so as a completely new process (such as after exec()).

divot
  • 21
  • 3