45

For child processes, the wait() and waitpid() functions can be used to suspends execution of the current process until a child has exited. But this function can not be used for non-child processes.

Is there another function, which can wait for exit of any process ?

chaos
  • 122,029
  • 33
  • 303
  • 309
CsTamas
  • 4,103
  • 5
  • 31
  • 34

14 Answers14

37

Nothing equivalent to wait(). The usual practice is to poll using kill(pid, 0) and looking for return value -1 and errno of ESRCH to indicate that the process is gone.

Update: Since linux kernel 5.3 there is a pidfd_open syscall, which creates an fd for a given pid, which can be polled to get notification when pid has exited.

xonatius
  • 58
  • 5
chaos
  • 122,029
  • 33
  • 303
  • 309
  • 2
    Is it ok to have such busy-loop ? – CsTamas Jul 21 '09 at 07:37
  • 1
    Well, you don't want to make it too busy; you should `usleep()` for a while after each `kill()` that doesn't find the process gone. Then you have to strike a balance between how busy your polling is and how long it's okay for the process to be gone before you notice. – chaos Jul 21 '09 at 07:41
  • Oh, `usleep()` became obsolete while I wasn't looking, apparently. Seems you should now `nanosleep()` instead. – chaos Jul 21 '09 at 07:43
  • maybe you wanted to write kill(pid,0) It's int kill(pid_t pid, int sig) – Metiu Jun 22 '10 at 14:17
  • @Metiu: Yeah, thanks. The inverted argument order in Perl and shell is always getting me. – chaos Jun 23 '10 at 13:54
  • Note that if the process is owned by a different user, you have to be slightly clever, and check whether you get EPERM: Operation not permitted (process is running) vs ESRCH: No such process (process exited). Credit for this idea goes to [ysth](http://stackoverflow.com/questions/5137187/perl-waiting-for-non-child-process-to-exit#comment-5764962). – daxelrod Oct 23 '11 at 20:36
  • 2
    @Sam Hocevar: And nothing about what the race condition consists of or how to do this without it. Not really helping. – chaos Jan 10 '12 at 15:34
  • 10
    @chaos: Nothing guarantees that `kill(pid, 0)` will signal the process you are interested in. It could have died and been replaced by another running process during your call to `nanosleep`. I'm afraid I don't feel the need to elaborate more: three good suggestions have been made (the FIFO, the semaphore, and the `ptrace` approach which IMHO is superior to all others despite being very platform-specific). – sam hocevar Jan 10 '12 at 16:14
16

So far I've found three ways to do this on Linux:

  • Polling: you check for the existence of the process every so often, either by using kill or by testing for the existence of /proc/$pid, as in most of the other answers
  • Use the ptrace system call to attach to the process like a debugger so you get notified when it exits, as in a3nm's answer
  • Use the netlink interface to listen for PROC_EVENT_EXIT messages - this way the kernel tells your program every time a process exits and you just wait for the right process ID. I've only seen this described in one place on the internet.

Shameless plug: I'm working on a program (open source of course; GPLv2) that does any of the three.

Community
  • 1
  • 1
David Z
  • 128,184
  • 27
  • 255
  • 279
15

On BSDs and OS X, you can use kqueue with EVFILT_PROC+NOTE_EXIT to do exactly that. No polling required. Unfortunately there's no Linux equivalent.

Hongli
  • 18,682
  • 15
  • 79
  • 107
8

You could also create a socket or a FIFO and read on them. The FIFO is especially simple: Connect the standard output of your child with the FIFO and read. The read will block until the child exits (for any reason) or until it emits some data. So you'll need a little loop to discard the unwanted text data.

If you have access to the source of the child, open the FIFO for writing when it starts and then simply forget about it. The OS will clean the open file descriptor when the child terminates and your waiting "parent" process will wake up.

Now this might be a process which you didn't start or own. In that case, you can replace the binary executable with a script that starts the real binary but also adds monitoring as explained above.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • 1
    Not a child and especially it might not be designed with this tracking in mind and not able to modify the source code. – Lothar Dec 07 '15 at 22:44
  • @Lothar I think it's good to show some solution outside of the obvious, especially since the accepted answer is unreliable. Also, any process can be turned into a child somehow. For example, you can replace the binary with a script that monitors the original binary and sends a signal when the now-child dies. – Aaron Digulla Dec 11 '15 at 12:27
5

Here is a way to wait for any process (not necessarily a child) in linux to exit (or get killed) without polling:

Using inotify to wait for the /proc'pid' to be deleted would be the perfect solution, but unfortunately inotify does not work with pseudo file systems like /proc. However we can use it with the executable file of the process. While the process still exists, this file is being held open. So we can use inotify with IN_CLOSE_NOWRITE to block until the file is closed. Of course it can be closed for other reasons (e.g. if another process with the same executable exits) so we have to filter those events by other means.

We can use kill(pid, 0), but that can't guarantee if it is still the same process. If we are really paranoid about this, we can do something else.

Here is a way that should be 100% safe against pid-reuse trouble: we open the pseudo directory /proc/'pid', and keep it open until we are done. If a new process is created in the meantime with the same pid, the directory file descriptor that we hold will still refer to the original one (or become invalid, if the old process cease to exist), but will NEVER refer the new process with the reused pid. Then we can check if the original process still exists by checking, for example, if the file "cmdline" exists in the directory with openat(). When a process exits or is killed, those pseudo files cease to exist too, so openat() will fail.

here is an example code:

// return -1 on error, or 0 if everything went well
int wait_for_pid(int pid)
{
    char path[32];
    int in_fd = inotify_init();
    sprintf(path, "/proc/%i/exe", pid);
    if (inotify_add_watch(in_fd, path, IN_CLOSE_NOWRITE) < 0) {
        close(in_fd);
        return -1;
    }
    sprintf(path, "/proc/%i", pid);
    int dir_fd = open(path, 0);
    if (dir_fd < 0) {
        close(in_fd);
        return -1;
    }

    int res = 0;
    while (1) {
        struct inotify_event event;
        if (read(in_fd, &event, sizeof(event)) < 0) {
            res = -1;
            break;
        }
        int f = openat(dir_fd, "fd", 0);
        if (f < 0) break;
        close(f);
    }

    close(dir_fd);
    close(in_fd);
    return res;
}
l_belev
  • 109
  • 1
  • 2
  • There is a ***bug***! As `/proc/PID/exe` generally point to a script or executable file located on your filesystem, any other process accessing same file will trig `inotifywait`!! Have a look at [my answer](https://stackoverflow.com/a/76809572/1765658) – F. Hauri - Give Up GitHub Aug 01 '23 at 08:13
3

You could attach to the process with ptrace(2). From the shell, strace -p PID >/dev/null 2>&1 seems to work. This avoid the busy-waiting, though it will slow down the traced process, and will not work on all processes (only yours, which is a bit better than only child processes).

a3nm
  • 8,717
  • 6
  • 31
  • 39
  • 1
    Knowledge never harms, but for shells, I recommend the "standard" way, polling periodically; see [question 1058047](http://stackoverflow.com/questions/1058047/wait-for-any-process-to-finish). Although it may be a rare case, but strace can make a busy loop. Eg $ (read) &; strace -p $!. Notice that (read) & itself is innocuous. – teika kazura Jul 30 '12 at 10:40
1

None I am aware of. Apart from the solution from chaos, you can use semaphores if you can change the program you want to wait for.

The library functions are sem_open(3), sem_init(3), sem_wait(3), ...

sem_wait(3) performs a wait, so you don´t have to do busy waiting as in chaos´ solution. Of course, using semaphores makes your programs more complex and it may not be worth the trouble.

Jochen Walter
  • 1,420
  • 11
  • 10
  • These semaphores are virtually useless, as they persist even if no process has them open. I remember having perdiodically to call ipcrm to cleanup the leftovers of some crashed process. –  Jan 03 '14 at 22:32
1

Maybe it could be possible to wait for /proc/[pid] or /proc/[pid]/[something] to disappear?

There are poll() and other file event waiting functions, maybe that could help?

Talkless
  • 19
  • 1
  • Yes, it's a good idea. Unless the same process id is reused so quickly - but probably this happens rarely – CsTamas Aug 12 '10 at 18:26
  • @CsTamas, there is protection where the number of process identifiers (32768) is much larger than the number of processes that can run. So the likelihood that you get the same process identifier is really low unless you fall asleep for a while. – Alexis Wilke Jul 29 '16 at 04:33
1

Since linux kernel 5.3 there is a pidfd_open syscall, which creates an fd for a given pid, which can be polled to get notification when pid has exited.

xonatius
  • 58
  • 5
0

Simply poll values number 22 and 2 of the /proc/[PID]/stat. The value 2 contains name of the executable and 22 contains start time. If they change, some other process has taken the same (freed) PID. Thus the method is very reliable.

0

You can use eBPF to achieve this.

The bcc toolkit implements many excellent monitoring capabilities based on eBPF. Among them, exitsnoop traces process termination, showing the command name and reason for termination, either an exit or a fatal signal.

   It catches processes of all users, processes in containers,  as  well  as  processes  that
   become zombie.

   This  works by tracing the kernel sched_process_exit() function using dynamic tracing, and
   will need updating to match any changes to this function.

   Since this uses BPF, only the root user can use this tool.

You can refer to this tool for related implementation.

You can get more information about this tool from the link below:

You can first install this tool and use it to see if it meets your needs, and then refer to its implementation for coding, or use some of the libraries it provides to implement your own functions.

exitsnoop examples:

   Trace all process termination
          # exitsnoop

   Trace all process termination, and include timestamps:
          # exitsnoop -t

   Exclude successful exits, only include non-zero exit codes and fatal signals:
          # exitsnoop -x

   Trace PID 181 only:
          # exitsnoop -p 181

   Label each output line with 'EXIT':
          # exitsnoop --label EXIT

Another option

Wait for a (non-child) process' exit using Linux's PROC_EVENTS

Reference project: https://github.com/stormc/waitforpid

mentioned in the project:

Wait for a (non-child) process' exit using Linux's PROC_EVENTS. Thanks to the CAP_NET_ADMIN POSIX capability permitted to the waitforpid binary, it does not need to be set suid root. You need a Linux kernel having CONFIG_PROC_EVENTS enabled.

hxysayhi
  • 1,888
  • 18
  • 25
0

Appricate @Hongli's answer for macOS with kqueue. I implement it with swift

/// Wait any pids, including non-child pid. Block until all pids exit.
/// - Parameters:
///   - timeout: wait until interval, nil means no timeout
/// - Throws: WaitOtherPidError
/// - Returns: isTimeout
func waitOtherPids(_ pids: [Int32], timeout: TimeInterval? = nil) throws -> Bool {
    
    // create a kqueue
    let kq = kqueue()
    if kq == -1 {
        throw WaitOtherPidError.createKqueueFailed(String(cString: strerror(errno)!))
    }
    
    // input
    // multiple changes is OR relation, kevent will return if any is match
    var changes: [Darwin.kevent] = pids.map({ pid in
        Darwin.kevent.init(ident: UInt(pid), filter: Int16(EVFILT_PROC), flags: UInt16(EV_ADD | EV_ENABLE), fflags: NOTE_EXIT, data: 0, udata: nil)
    })
    
    let timeoutDeadline = timeout.map({ Date(timeIntervalSinceNow: $0)})
    let remainTimeout: () ->timespec? = {
        if let deadline = timeoutDeadline {
            let d = max(deadline.timeIntervalSinceNow, 0)
            let fractionalPart = d - TimeInterval(Int(d))
            return timespec(tv_sec: Int(d), tv_nsec: Int(fractionalPart * 1000 * 1000 * 1000))
        } else {
            return nil
        }
    }
    
    // output
    var events = changes.map{ _ in Darwin.kevent.init() }
    
    while !changes.isEmpty {
        
        // watch changes
        // sync method
        let numOfEvent: Int32
        if var timeout = remainTimeout() {
            numOfEvent = kevent(kq, changes, Int32(changes.count), &events, Int32(events.count), &timeout);
        } else {
            numOfEvent = kevent(kq, changes, Int32(changes.count), &events, Int32(events.count), nil);
        }
        
        if numOfEvent < 0 {
            throw WaitOtherPidError.keventFailed(String(cString: strerror(errno)!))
        }
        if numOfEvent == 0 {
            // timeout. Return directly.
            return true
        }
        
        // handle the result
        let realEvents = events[0..<Int(numOfEvent)]
        let handledPids = Set(realEvents.map({ $0.ident }))
        changes = changes.filter({ c in
            !handledPids.contains(c.ident)
        })

        for event in realEvents {
            if Int32(event.flags) & EV_ERROR > 0 { // @see 'man kevent'
                let errorCode = event.data
                if errorCode == ESRCH {
                    // "The specified process to attach to does not exist"
                    // ingored
                } else {
                    print("[Error] kevent result failed with code \(errorCode), pid \(event.ident)")
                }
            } else {
                // succeeded event, pid exit
            }
        }
    }
    return false
}
enum WaitOtherPidError: Error {
    case createKqueueFailed(String)
    case keventFailed(String)
}

leavez
  • 2,119
  • 2
  • 27
  • 36
0

PR_SET_PDEATHSIG can be used to wait for parent process termination

Mr.Wang from Next Door
  • 13,670
  • 12
  • 64
  • 97
0

My solution (using inotifywait)

This is based on 's /proc filesystem.

My need was to start a 2nd (overall) backup, once containers backups is done. Containers backups is started by .

Watching for cron tasks

read -r wpid < <(ps -C backup.sh ho pid)
ls -l /proc/$wpid/fd
total 0
lr-x------ 1 user user 64  1 aoû 09:13 0 -> pipe:[455151052]
lrwx------ 1 user user 64  1 aoû 09:13 1 -> /tmp/#41418 (deleted)
lrwx------ 1 user user 64  1 aoû 09:13 2 -> /tmp/#41418 (deleted)

Where deleted entries was created by cron. But even if deleted, you could watch for file descriptor directly:

inotifywait  /proc/$wpid/fd/1
/proc/511945/fd/1 CLOSE_WRITE,CLOSE 

or

inotifywait  /proc/$wpid/fd/0
/proc/511945/fd/0 CLOSE_NOWRITE,CLOSE 

Note: My overall backup is run as root user! If no this could require sudo because command is run under cron session!

Same session

Just test: In a 1st window, hit:

sleep 0.42m <<<'' >/dev/null 2>&1

Then in another window:

read -r wpid < <(ps -C sleep wwho pid,cmd| sed 's/ sleep 0\.42m$//p;d')
ls -l /proc/$wpid/fd
total 0
lr-x------ 1 user user 64  1 aoû 09:38 0 -> pipe:[455288137]
l-wx------ 1 user user 64  1 aoû 09:38 1 -> /dev/null
l-wx------ 1 user user 64  1 aoû 09:38 2 -> /dev/null

Don't try to watch for 1 or 2! Because they point to /dev/null, any process acessing to them will trig inotifywait.

inotifywait  /proc/$wpid/fd/0
/proc/531119/fd/0 CLOSE_NOWRITE,CLOSE

Elapsed time in seconds

1st window:

sleep 0.42m <<<'' >/dev/null 2>&1

2nd window:

read -r wpid < <(ps -C sleep wwho pid,cmd| sed 's/ sleep 0\.42m$//p;d')
startedAt=$(ps ho lstart $wpid | date -f - +%s)
inotifywait  /proc/$wpid/fd/0;echo $((EPOCHSECONDS-startedAt))
/proc/533967/fd/0 CLOSE_NOWRITE,CLOSE 
25

Conclusion.

Using inotifywait seem to be a good solution, mostly watching for command's standard input (fd/0). But this must be tested case by case.

F. Hauri - Give Up GitHub
  • 64,122
  • 17
  • 116
  • 137