I'm trying to improve the startup scripts for several servers running in a cluster environment. The server processes should run indefinitely but occasionally fails on startup issuing e.g., Address already in use
exceptions.
I'd like the exit code for the startup script to reflect these early terminations by, say, waiting for 1 second and telling me if the server seems to have started okay. I also need the server PID echoed.
Here's my best shot so far:
$ cat startup.sh
# start the server in the bg but if it fails in the first second,
# then kill startup.sh.
CMD="start_server -option1 foo -option2 bar"
eval "($CMD >> cc.log 2>&1 || kill -9 $$ &)"
SERVER_PID=$!
# the `kill` above only has 1 second to kill me-- otherwise my exit code is 0
sleep 1
echo $SERVER_PID
The exit code works fine but two problems remain:
If the server is long-running but eventually encounters an error, the parent
startup.sh
will have exited already and the$$
PID may have been reused by an unrelated process which this script will then kill off.The
SERVER_PID
isn't correct since it's the PID of the subshell rather than thestart_server
command (which in this case is a grandchild of thestartup.sh
script.
Is there a simpler way to background the start_server
process, get its PID, and use a timeout'ed check for error codes? I looked into bash builtins wait
and timeout
but they don't seem to work for processes that shouldn't exit in the end.
I can't change the server code and the startup script should not run indefinitely.