4

I have a script that will track a process and if that process dies, it will respawn it. I want the tracking script to also kill the process if told to do so by giving the tracking script a sigterm (for example.). In other words, if I kill the tracking script, it should also kill the process that it's tracking, not respawn anymore and exit.

Cobbling together several posts (which I think are the best practices, for instance don't use a PID file), I get the following:

#!/bin/bash

DESC="Foo Manager"
EXEC="python /myPath/bin/FooManager.pyc"

trap "BREAK=1;pkill -HUP -P $BASHPID;exit 0" SIGHUP SIGINT SIGTERM

until $EXEC
do
    echo "Server $DESC crashed with exit code $?.  Restarting..." >&2
    ((BREAK!=0)) && echo "Breaking" && exit 1
    sleep 1
done

So, now if I run this script in one xterm. And then in another xterm I send the script something like:

kill -HUP <tracking_script_pid>  # Doesn't work.
kill -TERM <tracking_script_pid>  #Doesn't work.

The tracking script does not end or anything. If I run FooManager.pyc from the commandline, it will die on SIGHUP and SIGTERM. Anyways, what could I be doing wrong here, and perhaps there's a whole different way to do it?

thanks.

Bitdiot
  • 1,506
  • 2
  • 16
  • 30
  • I suppose this is interesting as a bash exercise, but if you're trying to do this for something real, just get foreman: http://ddollar.github.io/foreman/ – Charlie Martin Dec 22 '14 at 18:00

1 Answers1

3

From the manual:

If Bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will not be executed until the command completes. When Bash is waiting for an asynchronous command via the wait builtin, the reception of a signal for which a trap has been set will cause the wait builtin to return immediately with an exit status greater than 128, immediately after which the trap is executed.

Emphasis is mine.

So in your case, while your command is executing, Bash will wait until it ends before it triggers the trap.

To fix this, you need to run your program as a job, and wait for it. If your program never exits with a return code greater than 128, you could simplify the following code, but I'm not making this assumption:

#!/bin/bash

desc="Foo Manager"
to_exec=( python "/myPath/bin/FooManager.pyc" )

trap 'trap_triggered=true' SIGHUP SIGINT SIGTERM

trap_triggered=false
while ! $trap_triggered; do
   "${to_exec[@]}" &
   job_pid=$!
   wait $job_pid
   job_ret=$?
   if [[ $job_ret = 0 ]]; then
      echo >&2 "Job ended gracefully with no errors... quitting..."
      break
   elif ! $trap_triggered; then
      echo >&2 "Server $desc crashed with exit code $job_ret. Restarting..."
   else
      printf >&2 "Received fatal signal... "
      if kill -0 $job_pid >&/dev/null; then
          printf >&2 "killing job $job_pid... "
          kill $job_pid
          wait $job_pid
      fi
      printf >&2 "quitting...\n"
   fi
done

Notes.

  1. I used lowercase variable name, since uppercase are considered bad practice: they can clash with Bash's reserved names, or environmental variables.
  2. I didn't use a string to store the command, but an array. With a string, you'll have a lot of problems if you want to have funny characters like spaces passed as arguments. With a properly quoted array, you won't have any problems. (Some would argue that it would be even better to use a function.)
oguz ismail
  • 1
  • 16
  • 47
  • 69
gniourf_gniourf
  • 44,650
  • 9
  • 93
  • 104
  • Thanks for this. For the most part it works, I had to make some changes to it, so that it would actually kill FooManager.pyc once the tracking script gets the trapped call. anyways, thanks alot!! – Bitdiot Dec 23 '14 at 13:23
  • Several other questions, is there some way to disinguish between a bad exit of FooManager that *doesn't* trigger a trap? FooManager has a potential to crash, and it would be nice to be able to tell when that happens as opposed to getting a signal to kill the tracking script. – Bitdiot Dec 23 '14 at 13:26
  • 1
    @Bitdiot in the previous version I forgot to kill the job. Please review my edit: you'll see where and how to do it. – gniourf_gniourf Dec 23 '14 at 13:41
  • Cool, this is awesome man, thanks. Can you tell me what kill -0 does? Researching this problem I see all these arguments that kill has, but when I do a manpage on them, none of it is described. At least, I can't find it. :) – Bitdiot Dec 23 '14 at 14:16
  • 1
    @Bitdiot see [here](http://stackoverflow.com/questions/11012527/what-does-kill-0-pid-in-a-shell-script-do) for an explanation of `kill -0`. – TTT Dec 23 '14 at 14:40