My experience implementing graceful shutdown for celery
workers spawned by supervisord
inside a docker
container.
Supervisord part
supervisord.conf
...
[supervisord]
...
nodaemon=true # run supervisord in the foreground
[include]
files=celery.conf # path to the celery config file
Set nodaemon=true
so that we can start it as a background process from the entrypoint script later.
celery.conf
[group:celery_workers]
programs=one, two
[program:one]
...
command=celery -A backend --config=celery.py worker -n worker_one --pidfile=/var/log/celery/worker_one.pid --pool=gevent --concurrency=10 --loglevel=INFO
killasgroup=true
stopasgroup=true
stopsignal=TERM
stopwaitsecs=600
[program:two]
...
# similar to the previous one
The configuration file above is responsible for starting a group
of workers each running in a separate process
within a group
. I'd like to stop on a stopwaitsecs
section value. Let's see what the documentation tells us about it:
This parameter sets the number of seconds to wait for the OS to return
a SIGCHLD to supervisord after the program has been sent a
stopsignal. If this number of seconds elapses before supervisord
receives a SIGCHLD from the process, supervisord will attempt to kill
it with a final SIGKILL.
If stopwaitsecs
>stop_grace_period
specified for your service in a docker-compose
file then you'll be getting SIGKILL
from your docker
. Make sure
stopwaitsecs
<stop_grace_period
, otherwise all running tasks get interrupted by docker
.
Entrypoint script part
entrypoint.sh
#!/bin/bash
# safety switch, exit script if there's error.
set -e
on_close(){
echo "Signal caught..."
echo "Supervisor is stopping processes gracefully..."
# cleanup all pid files
rm worker_one.pid
rm worker_two.pid
supervisorctl stop celery_workers:
echo "All processes have been stopped. Exiting..."
exit 1
}
start_supervisord(){
supervisord -c /etc/supervisor/supervisord.conf
}
# start trapping signals (docker sends `SIGTERM` for shutdown)
trap on_close SIGINT SIGTERM SIGKILL
start_supervisord & # start supervisord in a background
SUPERVISORD_PID=$! # PID of the last background process started
wait $SUPERVISORD_PID
EXIT_STATUS=$? # the exit status of the last command executed
The script above consists of:
- registering a cleanup function
on_close
- starting
supervisord
's process group in a background
- registering the last background process's
PID
and waiting for it to finish
Docker part
docker-compose.yml
...
services:
celery:
...
stop_grace_period: 15m30s
entrypoint: [/entrypoints/entrypoint.sh]
The only setting worth mentioning here is entrypoint
form declaration. In our case better to use exec
form. It starts an executable script in a process with PID
1 and doesn't create any subprocesses as shell
form does. SIGTERM
from docker stop <container>
gets propagated to an executable which traps it and performs all cleaning and closing logic.