I've got a program I didn't write; another department in our company did. Suppose the executable is called Snafu. I need to be able to run it unattended and detect if it crashed. This may happen 1 millisecond after launch, or 1 hour after launch. The program is a black box and it won't tell me or give me a hint that it has crashed. I don't have its PID either. Other than grepping the output of ps (let's say I want to run several Snafu instances at once), how can I detect the crash of one instance of Snafu? I will launch it from a BASH script.
Asked
Active
Viewed 275 times
2
-
2This question has a very good answer https://stackoverflow.com/questions/696839/how-do-i-write-a-bash-script-to-restart-a-process-if-it-dies – LMC Apr 08 '18 at 00:16
-
Oh.. solve the halting problem? – Martin James Apr 08 '18 at 11:31
-
The question does not mention infinite loops, so I don't think they're getting tripped up by the halting problem. Detecting crashes seems like a very reasonable and doable thing. – Mattie Jul 08 '18 at 12:41
1 Answers
0
It depends a lot on how it crashes.
- will it stop, ie. exit
- will it hang
- will it trash
- etc.
Going from the complete black-box approach, you will never know when the application crashes, and therefore it is impossible to react to it.
For the remainder, I will assume that snafu
exits.
The, more or less, traditional way of doing it is:
#!/bin/bash
snafu 1 &
snaf1=$!
snafu 2 &
snaf2=$!
while : ; do
sleep 1
if [ ps $snaf1 ] ; then
echo "snafu 1 still runs"
else
echo "snafu 1 died "
snafu 1 &
snafu1=$!
fi
#same if for snafu 2
done
See the remark from Luis Muñoz for the use of PID, ps etc; but still many people are doing this. Also: if you start snafu
from a shell script, you will have its PID.
Another approach would be:
#!/bin/bash
set -o monitor
trap what_when_died SIGCHLD
what_when_died(){
#actions to do when a snafu dies
}
snafu 1 &
snafu 2 &
sleep inf
There are some behaviour aspects that you need to consider; set -o monitor
will make the shell pass SIGINT to the children instead of reacting to it etc.

Ljm Dullaart
- 4,273
- 2
- 14
- 31
-
I tried the trap approach. That didn't work for some reason. The program I'm testing I think launches another program. – duuuuxq Apr 08 '18 at 20:48