2

I've got a program I didn't write; another department in our company did. Suppose the executable is called Snafu. I need to be able to run it unattended and detect if it crashed. This may happen 1 millisecond after launch, or 1 hour after launch. The program is a black box and it won't tell me or give me a hint that it has crashed. I don't have its PID either. Other than grepping the output of ps (let's say I want to run several Snafu instances at once), how can I detect the crash of one instance of Snafu? I will launch it from a BASH script.

duuuuxq
  • 21
  • 2
  • 2
    This question has a very good answer https://stackoverflow.com/questions/696839/how-do-i-write-a-bash-script-to-restart-a-process-if-it-dies – LMC Apr 08 '18 at 00:16
  • Oh.. solve the halting problem? – Martin James Apr 08 '18 at 11:31
  • The question does not mention infinite loops, so I don't think they're getting tripped up by the halting problem. Detecting crashes seems like a very reasonable and doable thing. – Mattie Jul 08 '18 at 12:41

1 Answers1

0

It depends a lot on how it crashes.

  • will it stop, ie. exit
  • will it hang
  • will it trash
  • etc.

Going from the complete black-box approach, you will never know when the application crashes, and therefore it is impossible to react to it.

For the remainder, I will assume that snafu exits.

The, more or less, traditional way of doing it is:

#!/bin/bash
snafu 1 &
snaf1=$!
snafu 2 &
snaf2=$!
while : ; do
    sleep 1
    if [ ps $snaf1 ] ; then
        echo "snafu 1 still runs"
    else
        echo "snafu 1 died "
        snafu 1 &
        snafu1=$!
    fi
    #same if for snafu 2
 done

See the remark from Luis Muñoz for the use of PID, ps etc; but still many people are doing this. Also: if you start snafu from a shell script, you will have its PID.

Another approach would be:

#!/bin/bash
set -o monitor
trap what_when_died SIGCHLD
what_when_died(){
    #actions to do when a snafu dies
}


snafu 1 &
snafu 2 &

sleep inf

There are some behaviour aspects that you need to consider; set -o monitor will make the shell pass SIGINT to the children instead of reacting to it etc.

Ljm Dullaart
  • 4,273
  • 2
  • 14
  • 31
  • I tried the trap approach. That didn't work for some reason. The program I'm testing I think launches another program. – duuuuxq Apr 08 '18 at 20:48