I have a flask application that allows users to start long running tasks (sometimes > 1d) via a celery job queue. The flask application and all its dependencies including the celery workers are containerized via docker and start with a docker-compose file.
My problem is that when I update the container images with a new version of the application software I need to restart the containers with:
docker-compose down
docker-compose up -d
This will cancel all long running jobs as there is only a short timeout value per default in docker-compose. Setting a longer timeout value for a graceful stop by docker-compose as suggested in docker-compose and graceful Celery shutdown does not work for me, as there is no way to predict how long the jobs will take and the update might take very long until all tasks are finished.
My idea was to somehow detach the running container from the docker-compose
control, and then issue a graceful shutdown of celery inside the detached container, which then allows the jobs to finish, but does not accept new jobs. Then I could start the normal containers stack via docker-compose up -d
.
Thus I would like to do:
- remove/rename celery container from docker compose
- signal the celery task in the container to stop gracefully and let the jobs finish but not accept new jobs
- then start the new containers that will accept new jobs
I tried to use docker rename
to rename containers started by docker-compose, but they still react to docker-compose down
.
My question is if this approach is the correct way to handle this and if this is even possible with docker-compose? What would be the best practice to handle graceful updates of celery workers with long running tasks in a docker-compose environment?
Other questions that I found that are related but do not solve the problem entirely:
docker-compose and graceful Celery shutdown : the answer shows how to stop the containers gracefully but I want to start a new celery worker immediately to have no down time.
How do I restart celery workers gracefully?: This works for a local installation, but I have to restart the containers to get the new application code.
EDIT: New hints to the solution:
In this issue I've found a similar situation. Here docker-compose --scale
is used to duplicate a service then one can find the IDs off the old and the new service. Once the new service is up one should be able to tell celery to shutdown and finish the executing tasks in the old container. If this is the solution I will add this as an answer later.
https://github.com/docker/compose/issues/1786#
EDIT: Thinking more about the variant with scaling. Here again I have the problem with the long running tasks. It would be cumbersome to watch the dying container until I can scale back to 1 instance. In the example in the link it was only important to check that the new service is really up before stopping the old one, so that the script could scale back to a single instance immediately. I would rather duplicate the service but remove the new service from control of docker-compose so that it wont get killed when I scale back to 1 container. This must be possible by removing the docker-compose labels of the running container:
"Labels": {
"com.docker.compose.config-hash": "44e0bbd2a10e28bcad071a42315e65ed4d89f2d815a08aed4f3133b05b9d9f71",
"com.docker.compose.container-number": "1",
"com.docker.compose.oneoff": "False",
"com.docker.compose.project": "karmada_docker_upgreat",
"com.docker.compose.project.config_files": "docker-compose_test.yml",
"com.docker.compose.project.working_dir": "/home/USERNAME/git/karmada_docker_upgreat",
"com.docker.compose.service": "karmada_celery_kalibrate_worker",
"com.docker.compose.version": "1.25.0"
}
Or is this the wrong track? Renaming the service does not make a difference to docker-compose.
** EDIT ** Labels can not be changed for a running container: https://github.com/moby/moby/issues/15496 The more I think about this I think I will have to use normal docker commands to run the celery containers. With docker commands and a shell script it would be easy to achieve what I need to do. I would still like to see a solution in docker-compose.