prctl(PR_SET_PDEATHSIG) race condition

Question

As I understand, the best way to achieve terminating a child process when its parent dies is via prctl(PR_SET_PDEATHSIG) (at least on Linux): How to make child process die after parent exits?

There is one caveat to this mentioned in man prctl:

This value is cleared for the child of a fork(2) and (since Linux 2.4.36 / 2.6.23) when executing a set-user-ID or set-group-ID binary, or a binary that has associated capabilities (see capabilities(7)). This value is preserved across execve(2).

So, the following code has a race condition:

parent.c:

#include <unistd.h>

int main(int argc, char **argv) {
  int f = fork();
  if (fork() == 0) {
    execl("./child", "child", NULL, NULL);
  }
  return 0;
}

child.c:

#include <sys/prctl.h>
#include <signal.h>

int main(int argc, char **argv) {
  prctl(PR_SET_PDEATHSIG, SIGKILL); // ignore error checking for now
  // ...
  return 0;
}

Namely, the parent count die before prctl() is executed in the child (and thus the child will not receive the SIGKILL). The proper way to address this is to prctl() in the parent before the exec():

parent.c:

#include <unistd.h>
#include <sys/prctl.h>
#include <signal.h>

int main(int argc, char **argv) {
  int f = fork();
  if (fork() == 0) {
    prctl(PR_SET_PDEATHSIG, SIGKILL); // ignore error checking for now
    execl("./child", "child", NULL, NULL);
  }
  return 0;
}

child.c:

int main(int argc, char **argv) {
  // ...
  return 0;
}

However, if ./child is a setuid/setgid binary, then this trick to avoid the race condition doesn't work (exec()ing the setuid/setgid binary causes the PDEATHSIG to be lost as per the man page quoted above), and it seems like you are forced to employ the first (racy) solution.

Is there any way if child is a setuid/setgid binary to prctl(PR_SET_PDEATH_SIG) in a non-racy way?

I believe the answer is "no, _intentionally_", but I could be wrong. This is consistent with the general rule that the parent process can't mess with setuid children. — zwol, Feb 27 '17 at 22:00
@zwol I see. Is there any way for the child to verify that its (original) parent is still alive before it `prctl()`s? — Bailey Parker, Feb 27 '17 at 22:01
A partial solution is to go ahead and make the `prctl` call in the child. Immediately _afterward_, call `getppid()`. If the value returned is 1, you have missed the parent's death and should exit. Otherwise continue. This doesn't work if the program is started directly by `init`, though, and that's less rare than it used to be, and there's no race-free way that I know of to determine whether it was. — zwol, Feb 27 '17 at 22:03
Hm that seems like the best approach possible. Luckily, in my case, I always launch the setuid/gid child from a known parent (and over both of which I have source control). So, I know for certain that init will never launch the child. In terms of general solutions, it seems that you're right that there's no race-free way to determine if init was the original parent. But, this seems like the best approach possible. If you write it up, I'll mark you as the solver. — Bailey Parker, Feb 27 '17 at 22:24
I've written it up, but please wait at least 24 hours for someone cleverer to come along before accepting it. — zwol, Feb 27 '17 at 22:34
I don't recall ever seeing `prctl(PR_SET_DEATHSIG,)` in a normal application myself, but quite a few variants of the control pipe method. It does require that the parent process sets up the pipe, and in POSIX, that the child regularly polls it. In Linux, the child can simply set the read end nonblocking and asynchronous, so the kernel will generate a signal when the parent process exits (or writes to the pipe), as hopefully shown [in my answer below](http://stackoverflow.com/a/42498370/1475978). Note that I spun this code up for this answer, so it is not well tested at all. — Nominal Animal, Feb 28 '17 at 00:45
Wouldn't your second example still have the same race as the first? The parent could die after the fork, but before the call to `prctl(PR_SET_DEATHSIG,)`. — cmtm, Feb 13 '19 at 23:12
I still don't think it would work, because forking clears the PR_SET_DEATHSIG. — cmtm, Feb 16 '19 at 01:48
True. It's been a while since I looked at this and I forgot the situations in which it is cleared. Even more reason to use the accepted answer! — Bailey Parker, Feb 16 '19 at 21:44

score 4 · Accepted Answer · answered Feb 28 '17 at 00:38

It is much more common to have the parent process set up a pipe. Parent process keeps the write end open (pipefd[1]), closing the read end (pipefd[0]). Child process closes the write end (pipefd[1]), and sets the read end (pipefd[1]) nonblocking.

This way, the child process can use read(pipefd[0], buffer, 1) to check if the parent process is still alive. If the parent is still running, it will return -1 with errno == EAGAIN (or errno == EINTR).

Now, in Linux, the child process can also set the read end async, in which case it will be sent a signal (SIGIO by default) when the parent process exits:

fcntl(pipefd[0], F_SETSIG, desired_signal);
fcntl(pipefd[0], F_SETOWN, getpid());
fcntl(pipefd[0], F_SETFL, O_NONBLOCK | O_ASYNC);

Use a siginfo handler for desired_signal. If info->si_code == POLL_IN && info->si_fd == pipefd[0], the parent process either exited or wrote something to the pipe. Because read() is async-signal safe, and the pipe is nonblocking, you can use read(pipefd[0], &buffer, sizeof buffer) in the signal handler whether the parent wrote something, or if parent exited (closed the pipe). In the latter case, the read() will return 0.

As far as I can see, this approach has no race conditions (if you use a realtime signal, so that the signal is not lost because an user-sent one is already pending), although it is very Linux-specific. After setting the signal handler, and at any point during the lifetime of the child process, the child can always explicitly check if the parent is still alive, without affecting the signal generation.

So, to recap, in pseudocode:

Construct pipe
Fork child process

Child process:
    Close write end of pipe
    Install pipe signal handler (say, SIGRTMIN+0)
    Set read end of pipe to generate pipe signal (F_SETSIG)
    Set own PID as read end owner (F_SETOWN)
    Set read end of pipe nonblocking and async (F_SETFL, O_NONBLOCK | O_ASYNC)
    If read(pipefd[0], buffer, sizeof buffer) == 0,
        the parent process has already exited.

    Continue with normal work.

Child process pipe signal handler:
    If siginfo->si_code == POLL_IN and siginfo->si_fd == pipefd[0],
        parent process has exited.
        To immediately die, use e.g. raise(SIGKILL).    

Parent process:
    Close read end of pipe

    Continue with normal work.

I do not expect you to believe my word.

Below is a crude example program you can use to check this behaviour yourself. It is long, but only because I wanted it to be easy to see what is happening at runtime. To implement this in a normal program, you only need a couple of dozen lines of code. example.c:

#define _GNU_SOURCE
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <fcntl.h>
#include <signal.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>

static volatile sig_atomic_t done = 0;

static void handle_done(int signum)
{
    if (!done)
        done = signum;
}

static int install_done(const int signum)
{
    struct sigaction act;

    memset(&act, 0, sizeof act);
    sigemptyset(&act.sa_mask);
    act.sa_handler = handle_done;
    act.sa_flags = 0;
    if (sigaction(signum, &act, NULL) == -1)
        return errno;

    return 0;
}

static int  deathfd = -1;

static void death(int signum, siginfo_t *info, void *context)
{
    if (info->si_code == POLL_IN && info->si_fd == deathfd)
        raise(SIGTERM);
}

static int install_death(const int signum)
{
    struct sigaction act;

    memset(&act, 0, sizeof act);
    sigemptyset(&act.sa_mask);
    act.sa_sigaction = death;
    act.sa_flags = SA_SIGINFO;
    if (sigaction(signum, &act, NULL) == -1)
        return errno;

    return 0;
}

int main(void)
{
    pid_t  child, p;
    int    pipefd[2], status;
    char   buffer[8];

    if (install_done(SIGINT)) {
        fprintf(stderr, "Cannot set SIGINT handler: %s.\n", strerror(errno));
        return EXIT_FAILURE;
    }

    if (pipe(pipefd) == -1) {
        fprintf(stderr, "Cannot create control pipe: %s.\n", strerror(errno));
        return EXIT_FAILURE;
    }

    child = fork();
    if (child == (pid_t)-1) {
        fprintf(stderr, "Cannot fork child process: %s.\n", strerror(errno));
        return EXIT_FAILURE;
    }

    if (!child) {
        /*
         * Child process.
        */

        /* Close write end of pipe. */
        deathfd = pipefd[0];
        close(pipefd[1]);

        /* Set a SIGHUP signal handler. */
        if (install_death(SIGHUP)) {
            fprintf(stderr, "Child process: cannot set SIGHUP handler: %s.\n", strerror(errno));
            return EXIT_FAILURE;
        }

        /* Set SIGTERM signal handler. */
        if (install_done(SIGTERM)) {
            fprintf(stderr, "Child process: cannot set SIGTERM handler: %s.\n", strerror(errno));
            return EXIT_FAILURE;
        }

        /* We want a SIGHUP instead of SIGIO. */
        fcntl(deathfd, F_SETSIG, SIGHUP);

        /* We want the SIGHUP delivered when deathfd closes. */
        fcntl(deathfd, F_SETOWN, getpid());

        /* Make the deathfd (read end of pipe) nonblocking and async. */
        fcntl(deathfd, F_SETFL, O_NONBLOCK | O_ASYNC);

        /* Check if the parent process is dead. */
        if (read(deathfd, buffer, sizeof buffer) == 0) {
            printf("Child process (%ld): Parent process is already dead.\n", (long)getpid());
            return EXIT_FAILURE;
        }

        while (1) {
            status = __atomic_fetch_and(&done, 0, __ATOMIC_SEQ_CST);
            if (status == SIGINT)
                printf("Child process (%ld): SIGINT caught and ignored.\n", (long)getpid());
            else
            if (status)
                break;
            printf("Child process (%ld): Tick.\n", (long)getpid());
            fflush(stdout);
            sleep(1);

            status = __atomic_fetch_and(&done, 0, __ATOMIC_SEQ_CST);
            if (status == SIGINT)
                printf("Child process (%ld): SIGINT caught and ignored.\n", (long)getpid());
            else
            if (status)
                break;
            printf("Child process (%ld): Tock.\n", (long)getpid());
            fflush(stdout);
            sleep(1);
        }

        printf("Child process (%ld): Exited due to %s.\n", (long)getpid(),
               (status == SIGINT) ? "SIGINT" :
               (status == SIGHUP) ? "SIGHUP" :
               (status == SIGTERM) ? "SIGTERM" : "Unknown signal.\n");
        fflush(stdout);

        return EXIT_SUCCESS;
    }

    /*
     * Parent process.
    */

    /* Close read end of pipe. */
    close(pipefd[0]);

    while (!done) {
        fprintf(stderr, "Parent process (%ld): Tick.\n", (long)getpid());
        fflush(stderr);
        sleep(1);
        fprintf(stderr, "Parent process (%ld): Tock.\n", (long)getpid());
        fflush(stderr);
        sleep(1);

        /* Try reaping the child process. */
        p = waitpid(child, &status, WNOHANG);
        if (p == child || (p == (pid_t)-1 && errno == ECHILD)) {
            if (p == child && WIFSIGNALED(status))
                fprintf(stderr, "Child process died from %s. Parent will now exit, too.\n",
                        (WTERMSIG(status) == SIGINT) ? "SIGINT" :
                        (WTERMSIG(status) == SIGHUP) ? "SIGHUP" :
                        (WTERMSIG(status) == SIGTERM) ? "SIGTERM" : "an unknown signal");
            else
                fprintf(stderr, "Child process has exited, so the parent will too.\n");
            fflush(stderr);
            break;
        }
    }

    if (done) {
        fprintf(stderr, "Parent process (%ld): Exited due to %s.\n", (long)getpid(),
                   (done == SIGINT) ? "SIGINT" :
                   (done == SIGHUP) ? "SIGHUP" : "Unknown signal.\n");
        fflush(stderr);
    }

    /* Never reached! */
    return EXIT_SUCCESS;
}

Compile and run the above using e.g.

gcc -Wall -O2 example.c -o example
./example

The parent process will print to standard output, and the child process to standard error. The parent process will exit if you press Ctrl+C; the child process will ignore that signal. The child process uses SIGHUP instead of SIGIO (although a realtime signal, say SIGRTMIN+0, would be safer); if generated by the parent process exiting, the SIGHUP signal handler will raise SIGTERM in the child.

To make the termination causes easy to see, the child catches SIGTERM, and exits the next iteration (a second later). If so desired, the handler can use e.g. raise(SIGKILL) to terminate itself immediately.

Both parent and child processes show their process IDs, so you can easily send a SIGINT/SIGHUP/SIGTERM signal from another terminal window. (The child process ignores SIGINT and SIGHUP sent from outside the process.)

十1 for solution that both avoids the race and manages to be more portable, less of a hack. — R.. GitHub STOP HELPING ICE, Feb 28 '17 at 02:43
And -1 for whichever idiotic mod forced us once again to use unicode homoglyphs to write a comment like that — R.. GitHub STOP HELPING ICE, Feb 28 '17 at 02:44
@R.., the `fcntl()` signal generation is quite Linux-specific, so it's not that portable. If the child process has a select()/poll() main loop, then it is trivial to include the pipe in that. Another possibility is to create a minimal cancelable helper thread, that has all signals blocked, and is itself in a blocking read from the pipe. If it ever reads a zero, it raises an appropriate signal. If the child process wishes to exit normally, it can just cancel the helper thread, as `read()` is a cancellation point (according to POSIX). — Nominal Animal, Feb 28 '17 at 04:36
I forgot to mention in my above answer that setting the pipe descriptors `O_CLOEXEC` in at least the parent is usually desirable, too. It is not related to the current question, but it ensures that if the parent forks and execs another process, the child process is notified that the parent exited. If one is paranoid about security, this ought to be a nice bonus in this approach. (Since the parent process ID does not change when it executes another binary, it is complicated for the child to otherwise ensure the parent process is still the same one that forked the child.) — Nominal Animal, Feb 28 '17 at 04:47
Agree completely on `O_CLOEXEC`. POSIX-future has `pipe2` which you should use to avoid a race condition. As for a signal, just reading the pipe is just as good if you control the code for the child process. Just have it create a thread whose sole purpose is to block on the pipe and `kill` the process when the pipe is closed, or handle it in some more elegant way. If you don't control the code in the child, you need the intermediary process approach, which I think someone described in one of the answers to the linked question. — R.. GitHub STOP HELPING ICE, Feb 28 '17 at 15:51
@R..: Agreed. (BTW, the first codebase I saw pipes used for generally similar exit detection was in DJB's Daemontools in late 90s, so this is definitely not "new".) This approach is also possible between unrelated but co-operating processes, if there is a third, helper process, connected via Unix domain sockets to the two: the pipe ends can be sent as ancillary messages. The tricky part is securely authenticating the Unix domain socket endpoints: each must verify the other peer (of the Unix domain socket) before and after the pipe descriptor transfer, to ensure no exec() trickery was afoot. — Nominal Animal, Feb 28 '17 at 16:13
'It is much more common to have the parent process set up a pipe.' Do you have a source for this? I mean many programs don't have to deal with setgid/setuid binaries and only target Linux, so `prctl(PR_SET_PDEATHSIG, ...)` is the perfect (short, reliable and easy to implement) solution for those cases. — maxschlepzig, Dec 06 '19 at 15:43

score 2 · Answer 2 · answered Dec 06 '19 at 15:35

Your last code snippet still contains a race condition:

int main(int argc, char **argv) {
  int f = fork();
  if (fork() == 0) {
    // <- !!!race time!!!
    prctl(PR_SET_PDEATHSIG, SIGKILL); // ignore error checking for now
    execl("./child", "child", NULL, NULL);
  }
  return 0;
}

Meaning that in the child, after the fork, until the prctl() has visible effects (think: returns), there is a time-window where the parent may exit.

To fix this race you have to save the PID of the parent before the fork and check it after the prctl() call, e.g.:

pid_t ppid_before_fork = getpid();
pid_t pid = fork();
if (pid == -1) { perror(0); exit(1); }
if (pid) {
    ; // continue parent execution
} else {
    int r = prctl(PR_SET_PDEATHSIG, SIGTERM);
    if (r == -1) { perror(0); exit(1); }
    // test in case the original parent exited just
    // before the prctl() call
    if (getppid() != ppid_before_fork)
        exit(1);
    // continue child execution ...

(see also)

Regarding executing a setuid/setgid program: You can then pass the ppid_before_fork by other means (e.g. in the argument or environment vector) and execute the prctl() (including the comparison) after the exec, i.e. inside the execed binary.

I agree with you that @Bailey's code contains a race condition. Since PIDs can be reused, your solution is not reliable either. — beroal, Dec 25 '21 at 09:46
@beroal You are mistaken. It doesn't matter that PIDs can be reused for the reliable parent death detection. Even if the parent has died and its PID gets reused by another process while the else-branch is executed - so what? The process that re-uses that PID can't just adopt our child process. Thus, it can't possibly influence the child's `getppid()` call. Note that when the parent dies the child process is orphaned and adopted by the configured reaper (usually PID 1, cf. `PR_SET_CHILD_SUBREAPER`). Thus, in that case `getppid()` then returns the PID of a long-running reaper process. — maxschlepzig, Dec 26 '21 at 20:56

score 1 · Answer 3 · answered Feb 27 '17 at 22:31

I don't know this for sure, but clearing the parent death signal on execve when invoking a set-id binary looks like an intentional restriction for security reasons. I'm not sure why, considering that you can use kill to send signals to setuid programs that share your real user ID, but they wouldn't have bothered making that change in 2.6.23 if there wasn't a reason to disallow it.

Since you control the code of the set-id child, here is a kludge: make the call to prctl, then immediately afterward, call getppid and see if it returns 1. If it does, then either the process was started directly by init (which is not as rare as it used to be) or the process was reparented to init before it had a chance to call prctl, which means its original parent is dead and it should exit.

(This is a kludge because I know of no way to rule out the possibility that the process was started directly by init. init never exits, so you have one case where it should exit and one case where it shouldn't and no way to tell which. But if you know from the larger design that the process will not be started directly by init, it should be reliable.)

(You must call getppid after prctl, or you have only narrowed the race window, not eliminated it.)

If the issue is important enough, maybe one could, by design, have the parent pass its pid as an argument to the child. — Mark Plotnick, Feb 27 '17 at 23:23
Even worse, you [can't rely on the subreaper process having PID 1](https://manpath.be/f30/2/prctl#L68). Thus, a more robust approach is to call `getpid()` before the fork, store its return value, and compare it in the child after the `prctl()` call with the `getppid()` return value (cf. [my answer](https://stackoverflow.com/a/59216119/427158)). — maxschlepzig, Dec 06 '19 at 15:49

prctl(PR_SET_PDEATHSIG) race condition

3 Answers3

Linked