How can a bash script restart a process on non-0 exit while sending signals to child

Question

Following this question: How do I write a bash script to restart a process if it dies?

I am trying to make a bash script which simply runs a python script and restarts the script if the script ends with a non-0 output. My bash script looks something like:

#!/bin/bash

trap 'kill $(jobs -p)' SIGTERM SIGKILL; 
until python test.py & wait; do 
  echo "Test Critically Crashed" >&2
  sleep 1; 
done

While my python script (though not really relevant) looks like:

import logging,sys,signal,time

def signal_term_handler(signal, frame):
  print("SIGTERM recieved...quitting")
  sys.exit(0)

signal.signal(signal.SIGTERM, signal_term_handler)
while True:
  time.sleep(1)
  sys.exit(1)

I would like to run the bash script and have it run my process infinitely until I send a sigterm or sigkill to the bash script in which it will send it to the child process (python test.py) and ultimately exit with code 0, thus breaking the until loop and exiting cleanly.

FYI I am using an infinitely running python script, and using this bash script as an entry point to a docker container.

FYI I learned with docker you shouldn't use this type of initializer you can just use "restart=on-failure" with docker run just as documented here: https://docs.docker.com/engine/reference/commandline/run/#restart-policies-restart — Brad Ruderman, Nov 29 '15 at 19:27
Can you not sort this out using [supervisord](http://supervisord.org/)? — 170730350, Nov 29 '15 at 20:41
`command & wait` is entirely pointless and broken (since `wait` without arguments always returns `0`). — Etan Reisner, Nov 29 '15 at 20:53
@EtanReisner : Maybe I'm missing your point, but my `man wait` says " wait with no operands, waits until all jobs known to the invoking shell have terminated. Seems to me O.P. has a reasonable expectation(:-?) . Good luck to all. — shellter, Nov 30 '15 at 00:57
@shellter `command & wait` is no different in what happens (the shell waiting until the command finishes before continuing execution) then simply `command` **and** it has the side effect of throwing away the return code from `command` (because `wait` with no arguments **always** returns `0` as the documentation says) and so is entirely unusable in an `until`/etc. context. — Etan Reisner, Nov 30 '15 at 02:20
`command & wait` is very different than `command` in the presence of signals. If a signal is sent to the shell given `command & wait`, any traps are immediately executed and `wait` returns non-zero, but when a signal is sent while the shell is running `command`, traps aren't executed until command completes. — William Pursell, Oct 03 '17 at 21:41

Matt · Accepted Answer · 2016-07-29T09:32:31.187

Don't write a shell script. Use systemd, supervisor, docker or any available service manager to manage the docker/script process directly. This is the job service managers were built to do, they live for it.

A systemd service would run docker run {image} python test.py and you would need to set it to run indefinitely.

A systemd config would look like:

[Unit]
Description=My Super Script
Requires=docker.service
After=docker.service

[Service]
ExecStart=/bin/docker run --name={container} --rm=true {image} python test.py
ExecStop=/bin/docker stop --time=10 {container}
TimeoutStopSec=11
KillMode=control-group

Restart=on-failure
RestartSec=5
TimeoutStartSec=5

[Install]
WantedBy=multi-user.target

The Restart=on-failure setting matches your requirement of only restarting the process when a non 0 exit code is returned so you can still kill the process underneath systemd, if required.

If you want to run and manage your python process inside an already running container, it might be easier to run supervisord as the main container process and have it manage python test.py. Supervisor is not as feature complete as systemd but it can do all the basic service management tasks.

How can a bash script restart a process on non-0 exit while sending signals to child

1 Answers1