1

I have a docker compose that contains the following stack: django application, rabbitmq, celery beat (master) and celery workers (agents).

When I want to deploy a new deployment of the docker compose I need to kill the former containers, and I want to find a way to do it with out killing a celery worker in the middle of a task. Basiclly I want my CD job to be able to wait until the celery worker is not running a task before we kill the container.

I thought about maybe:

  1. Killing the celery beat container
  2. Monitoring the amount of running processes running in the celery worker container and once it hits 0 to kill the container.

The main issue is that the killing the celery beat container is not enough for the celery worker container to pull new tasks, since it is possible that more tasks were queued in the rabbitmq before I killed the celery beat container, and while the rabbitmq is up and the celery worker is up it can pull new tasks directly from the rabbitmq.

I thought about maybe killing the rabbitmq container as well, but I think it won't be good because we might not have dataloss, but it my result in long downtime.

How do you suggest to handle this issue with no dataloss and minimal down time?

Thank you!

user15937765
  • 219
  • 1
  • 7

1 Answers1

1

Every application implements its signal handler, including Celery.

Every orchestrator such as OS, Kubernetes, or docker-compose in your case tries to terminate all the applications it manages gently: first, a SIGTERM signal is sent to the application to notify that it needs to stop, and a few seconds later, if it still running a SIGKILL is sent - which force exit.

Those few seconds between SIGTERM to SIGKILL is the application opportunity to finish gracefully (one reason for that is to ensure that there won't be data loss).

In your scenario, I assume that you're using docker-compose down to stop all containers. docker-compose sends SIGTERM and waits 10 seconds and then sends SIGKILL.

Celery workers implement SIGTERM handler. That means that the worker will try to finish the currently running tasks and no new tasks will be pulled from the broker.

You might want to set worker_prefetch_multiplier to 1 (to ensure that there are no waiting tasks in the worker), and task_acks_late to True (so if a task fails/stops in the middle, it will run again right after your deployment).

That being said, I assume your maximum task duration is less than 10 seconds. If it doesn't you might want to increase the graceful time between the SIGTERM to SIGKILL in your orchestrator (look here for docker-compose).

ItayB
  • 10,377
  • 9
  • 50
  • 77