Using sleep and wait -n to implement simple timeout in bash, race condition or not?

Question

If I do this in a bash script:

sleep 10 &
sleep_pid=$!
some_command &
wait -n
cmd_pid=$!

if kill -0 $sleep_pid 2> /dev/null; then
    # all ok
    kill $sleep_pid
else
    # some_command hung
    ...code to log diagnostics and then kill -9 $cmd_pid...
fi

where some_command is something that should be quick but can hang due to rare errors.

Is there then a risk that some_command can be done and cleaned up before "wait -n" starts, so there is only the sleep to wait for? Or does the '&' after one command guarantee that the shell won't call waitpid() on it until the next line of input has been handled?

It works in interactive shells. If you do:

sleep 10 &
sleep 0 &
wait -n

then the "wait -n" returns right away even if you wait a couple of seconds before running it. But I'm not sure if it can be trusted for non-interactive shells?

EDIT: Clarifying need for diagnostics + some grammar.

It's *more* trustworthy in non-interactive shells -- you don't have your process-table entries getting reaped to give the user interactive feedback on jobs that completed. I wouldn't particularly trust this code in an interactive shell, but it should be quite solid in a noninteractive one. — Charles Duffy, Jun 21 '18 at 21:59
@CharlesDuffy So non-interactive shells don't do waitpid()/wait() unless explicitly asked to via the wait builtin? That means I should stop worrying about this and start looking for process leaks in all my other long running scripts instead. :) — Henrik Johansson, Jun 21 '18 at 22:30
And a solution similar to yours is suggested in an answer there: https://stackoverflow.com/a/10028986/94687 I find this kind of solutions elegant and clever; I couldn't come up with something similar myself. — imz -- Ivan Zakharyaschev, Feb 09 '22 at 18:00

ThrasherHT · Answer 1 · 2018-06-21T19:47:26.170

2

I believe you may be able to use the timeout command to do this.
http://man7.org/linux/man-pages/man1/timeout.1.html

timeout 10s command_to_run

You can check the exit status of the timeout command to know if it timed out.

timeout 2s sleep 10

if [[ $? -gt 0 ]]; then
  echo "it timed out"
else
  echo "It was successful"
fi

edited Jun 21 '18 at 19:47

answered Jun 21 '18 at 18:49

ThrasherHT

21
2

timeout works for most cases, but it doesn’t work if the thing you want to run is a function in your shell script, or if you want to get a stack trace before killing. (Maybe some versions of timeout allows the latter?) – Henrik Johansson Jun 21 '18 at 20:57

score 0 · Answer 2 · answered Jun 21 '18 at 19:22

0

By using the $! variable, we avoid relying on interactive job control features. Try this:

...long executing command... &
pid_long=$!

sleep 3 &
pid_sleep=$!

wait -n
kill -KILL $pid_long

The problem here is PID recycling. Unlikely to happen in 3 seconds, though.

In the case when the command finishes earlier than the sleep (and its PID has not been recycled to a new process) kill produces an error message; we could pipe that to /dev/null.

We should probably also kill the sleep in case it is the one that is lingering.

answered Jun 21 '18 at 19:22

Kaz

55,781
9
100
149

PID recycling isn't going to happen if the old entry is still in the process table, and if `waitpid()` or `wait()` hasn't been called (which is automatic only in interactive shells), it'll still be there as a zombie. – Charles Duffy Jun 21 '18 at 21:56
In my effort to keep the question short and to the point, I left out some detail that matters in this case (added in edit now). Speculative killing won't let me detect the problem and collect diagnostic info before cleaning up (which I didn't mention in the original question). – Henrik Johansson Jun 21 '18 at 21:58
@CharlesDuffy But `wait` **has** been called. When "wait -n" happens to reap the interesting command rather than `sleep`, then `$pid_long` is no longer a valid PID. `kill` will produce an error about a nonexistent PID. I reproduced this case in testing. – Kaz Jun 21 '18 at 22:31
@CharlesDuffy Non-interactive script. – Kaz Jun 22 '18 at 02:26
Oh -- I misread what you were saying. Yes, you're right -- if `wait -n` reaps the interesting command, it's no longer running, so it can't be killed, and yes, the PID could potentially be repurposed in that case (since it no longer has a zombie). – Charles Duffy Jun 22 '18 at 03:27

score 0 · Accepted Answer · answered Jun 25 '18 at 10:28

As @CharlesDuffy pointed out in comments, the answer is no, there is no race (provided it is run in a non-interactive shell).

Also there is no need (in non-interactive shells) to make sure the wait comes directly after the command, as non-interactive shells don't do automatic reaping of children.

But I guess one should wrap this in a sub-shell, so "wait -n" won't return early due to some previously started unrelated background job.

Using sleep and wait -n to implement simple timeout in bash, race condition or not?

3 Answers3

Linked