196

I have a bash script that launches a child process that crashes (actually, hangs) from time to time and with no apparent reason (closed source, so there isn't much I can do about it). As a result, I would like to be able to launch this process for a given amount of time, and kill it if it did not return successfully after a given amount of time.

Is there a simple and robust way to achieve that using bash?

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Greg
  • 6,038
  • 4
  • 22
  • 37

9 Answers9

298

(As seen in: BASH FAQ entry #68: "How do I run a command, and have it abort (timeout) after N seconds?")

You can use timeout*:

timeout 10 ping www.goooooogle.com

Otherwise, do what timeout does internally:

( cmdpid=$BASHPID; (sleep 10; kill $cmdpid) & exec ping www.goooooogle.com )

In case you want to do a timeout for longer bash code, use the second option as such:

( cmdpid=$BASHPID; 
    (sleep 10; kill $cmdpid) \
   & while ! ping -w 1 www.goooooogle.com 
     do 
         echo crap; 
     done )

* It's included in GNU Coreutils 8+, so most current Linux systems have it installed already, otherwise you can install it, e.g. sudo apt-get install timeout or sudo apt-get install coreutils

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • 12
    Re Ignacio's reply in case anyone else wonders what I did: the `cmdpid=$BASHPID` will not take the pid of the *calling* shell but the (first) subshell that is started by `()`. The `(sleep`... thing calls a second subshell within the first subshell to wait 10 secs in the background and kill the first subshell which, after having launched the killer subshell process, proceeds to execute its workload... – jamadagni Jun 08 '14 at 01:12
  • This does not appear to work (anymore?) without keeping the subshell in the forground with ';' instead of the '&' eg. (cmdpid=$BASHPID; (sleep 10;kill $cmdpid) ; sleep 25).. without the ';' the sleep 25 lives on. – Rondo Mar 10 '16 at 04:52
  • 6
    This command doesn't 'finish early'. It will always kill the process at the timeout - but won't handle the situation where it didn't timeout. – hawkeye Sep 15 '16 at 07:52
  • Can anyone explain why is `exec` important in `( cmdpid=$BASHPID; (sleep 10; kill $cmdpid) & exec ping www.goooooogle.com )`? Seems like it does not work without it. – foki Nov 28 '21 at 06:27
  • @foki the exec (see `help exec` for the builtin, and `man 3 exec` for the syscall) effectively replaces the process of whatever calls exec (in this case, the surrounding subshell `( )`, with the command executable. If the subshell's PID is X, then the command will be X. If you don't use `exec`, then the command will run as a subprocess. – init_js Jul 01 '23 at 23:24
  • There's a bit of a process dance missing here, which is that the timer process might kill an unrelated process if the PID gets reused before the timer fires. It would be safer for the watchdog to check if its parent pid is 1 after the timer expires (in this case, it means that its parent process died during its sleep). – init_js Jul 01 '23 at 23:39
35
# Spawn a child process:
(dosmth) & pid=$!
# in the background, sleep for 10 secs then kill that process
(sleep 10 && kill -9 $pid) &

or to get the exit codes as well:

# Spawn a child process:
(dosmth) & pid=$!
# in the background, sleep for 10 secs then kill that process
(sleep 10 && kill -9 $pid) & waiter=$!
# wait on our worker process and return the exitcode
exitcode=$(wait $pid && echo $?)
# kill the waiter subshell, if it still runs
kill -9 $waiter 2>/dev/null
# 0 if we killed the waiter, cause that means the process finished before the waiter
finished_gracefully=$?
Dan
  • 3,490
  • 2
  • 22
  • 27
  • 9
    You shouldn't use `kill -9` before you try signals that a process can process first. – Dennis Williamson Mar 02 '11 at 02:31
  • True, I was going for a fast fix however and just assumed that he wants the process dead instantly because he said it crashes – Dan Mar 02 '11 at 15:27
  • 10
    That's actually a very bad solution. What if `dosmth` terminates in 2 seconds, another process takes the old pid, and you kill the new one ? – Teleporting Goat Jan 03 '17 at 10:26
  • 2
    PID recycling works by reaching the limit and wrapping around. It is very unlikely for another process to reuse the PID within the remaining 8 seconds, unless if the system is going haywire completely. – kittydoor Nov 15 '19 at 14:30
13
sleep 999&
t=$!
sleep 10
kill $t
DigitalRoss
  • 143,651
  • 25
  • 248
  • 329
  • It incurs excessive waiting. What if a real command (`sleep 999` here) often finishes faster than the imposed sleep (`sleep 10`)? What if I wish to give it a chance up to 1 minute, 5 minutes? What if I have a bunch of such cases in my script :) – it3xl Mar 09 '19 at 08:28
4

I also had this question and found two more things very useful:

  1. The SECONDS variable in bash.
  2. The command "pgrep".

So I use something like this on the command line (OSX 10.9):

ping www.goooooogle.com & PING_PID=$(pgrep 'ping'); SECONDS=0; while pgrep -q 'ping'; do sleep 0.2; if [ $SECONDS = 10 ]; then kill $PING_PID; fi; done

As this is a loop I included a "sleep 0.2" to keep the CPU cool. ;-)

(BTW: ping is a bad example anyway, you just would use the built-in "-t" (timeout) option.)

Ulrich
  • 41
  • 2
1

One way is to run the program in a subshell, and communicate with the subshell through a named pipe with the read command. This way you can check the exit status of the process being run and communicate this back through the pipe.

Here's an example of timing out the yes command after 3 seconds. It gets the PID of the process using pgrep (possibly only works on Linux). There is also some problem with using a pipe in that a process opening a pipe for read will hang until it is also opened for write, and vice versa. So to prevent the read command hanging, I've "wedged" open the pipe for read with a background subshell. (Another way to prevent a freeze to open the pipe read-write, i.e. read -t 5 <>finished.pipe - however, that also may not work except with Linux.)

rm -f finished.pipe
mkfifo finished.pipe

{ yes >/dev/null; echo finished >finished.pipe ; } &
SUBSHELL=$!

# Get command PID
while : ; do
    PID=$( pgrep -P $SUBSHELL yes )
    test "$PID" = "" || break
    sleep 1
done

# Open pipe for writing
{ exec 4>finished.pipe ; while : ; do sleep 1000; done } &  

read -t 3 FINISHED <finished.pipe

if [ "$FINISHED" = finished ] ; then
  echo 'Subprocess finished'
else
  echo 'Subprocess timed out'
  kill $PID
fi

rm finished.pipe
Gavin Smith
  • 3,076
  • 1
  • 19
  • 25
1

Assuming you have (or can easily make) a pid file for tracking the child's pid, you could then create a script that checks the modtime of the pid file and kills/respawns the process as needed. Then just put the script in crontab to run at approximately the period you need.

Let me know if you need more details. If that doesn't sound like it'd suit your needs, what about upstart?

kojiro
  • 74,557
  • 19
  • 143
  • 201
0

Here's an attempt which tries to avoid killing a process after it has already exited, which reduces the chance of killing another process with the same process ID (although it's probably impossible to avoid this kind of error completely).

run_with_timeout ()
{
  t=$1
  shift

  echo "running \"$*\" with timeout $t"

  (
  # first, run process in background
  (exec sh -c "$*") &
  pid=$!
  echo $pid

  # the timeout shell
  (sleep $t ; echo timeout) &
  waiter=$!
  echo $waiter

  # finally, allow process to end naturally
  wait $pid
  echo $?
  ) \
  | (read pid
     read waiter

     if test $waiter != timeout ; then
       read status
     else
       status=timeout
     fi

     # if we timed out, kill the process
     if test $status = timeout ; then
       kill $pid
       exit 99
     else
       # if the program exited normally, kill the waiting shell
       kill $waiter
       exit $status
     fi
  )
}

Use like run_with_timeout 3 sleep 10000, which runs sleep 10000 but ends it after 3 seconds.

This is like other answers which use a background timeout process to kill the child process after a delay. I think this is almost the same as Dan's extended answer (https://stackoverflow.com/a/5161274/1351983), except the timeout shell will not be killed if it has already ended.

After this program has ended, there will still be a few lingering "sleep" processes running, but they should be harmless.

This may be a better solution than my other answer because it does not use the non-portable shell feature read -t and does not use pgrep.

Gavin Smith
  • 3,076
  • 1
  • 19
  • 25
  • What's the difference between `(exec sh -c "$*") &` and `sh -c "$*" &`? Specifically, why use the former instead of the latter? – Justin C May 31 '18 at 17:11
0

Here's the third answer I've submitted here. This one handles signal interrupts and cleans up background processes when SIGINT is received. It uses the $BASHPID and exec trick used in the top answer to get the PID of a process (in this case $$ in a sh invocation). It uses a FIFO to communicate with a subshell that is responsible for killing and cleanup. (This is like the pipe in my second answer, but having a named pipe means that the signal handler can write into it too.)

run_with_timeout ()
{
  t=$1 ; shift

  trap cleanup 2

  F=$$.fifo ; rm -f $F ; mkfifo $F

  # first, run main process in background
  "$@" & pid=$!

  # sleeper process to time out
  ( sh -c "echo \$\$ >$F ; exec sleep $t" ; echo timeout >$F ) &
  read sleeper <$F

  # control shell. read from fifo.
  # final input is "finished".  after that
  # we clean up.  we can get a timeout or a
  # signal first.
  ( exec 0<$F
    while : ; do
      read input
      case $input in
        finished)
          test $sleeper != 0 && kill $sleeper
          rm -f $F
          exit 0
          ;;
        timeout)
          test $pid != 0 && kill $pid
          sleeper=0
          ;;
        signal)
          test $pid != 0 && kill $pid
          ;;
      esac
    done
  ) &

  # wait for process to end
  wait $pid
  status=$?
  echo finished >$F
  return $status
}

cleanup ()
{
  echo signal >$$.fifo
}

I've tried to avoid race conditions as far as I can. However, one source of error I couldn't remove is when the process ends near the same time as the timeout. For example, run_with_timeout 2 sleep 2 or run_with_timeout 0 sleep 0. For me, the latter gives an error:

timeout.sh: line 250: kill: (23248) - No such process

as it is trying to kill a process that has already exited by itself.

Gavin Smith
  • 3,076
  • 1
  • 19
  • 25
0
#Kill command after 10 seconds
timeout 10 command

#If you don't have timeout installed, this is almost the same:
sh -c '(sleep 10; kill "$$") & command'

#The same as above, with muted duplicate messages:
sh -c '(sleep 10; kill "$$" 2>/dev/null) & command'
Punnerud
  • 7,195
  • 2
  • 54
  • 44