25

We use Celery with our Django webapp to manage offline tasks; some of these tasks can run up to 120 seconds.

Whenever we make any code modifications, we need to restart Celery to have it reload the new Python code. Our current solution is to send a SIGTERM to the main Celery process (kill -s 15 `cat /var/run/celeryd.pid`), then to wait for it to die and restart it (python manage.py celeryd --pidfile=/var/run/celeryd.pid [...]).

Because of the long-running tasks, this usually means the shutdown will take a minute or two, during which no new tasks are processed, causing a noticeable delay to users currently on the site. I'm looking for a way to tell Celery to shutdown, but then immediately launch a new Celery instance to start running new tasks.

Things that didn't work:

  • Sending SIGHUP to the main process: this caused Celery to attempt to "restart," by doing a warm shutdown and then relaunching itself. Not only does this take a long time, it doesn't even work, because apparently the new process launches before the old one dies, so the new one complains ERROR: Pidfile (/var/run/celeryd.pid) already exists. Seems we're already running? (PID: 13214) and dies immediately. (This looks like a bug in Celery itself; I've let them know about it.)
  • Sending SIGTERM to the main process and then immediately launching a new instance: same issue with the Pidfile.
  • Disabling the Pidfile entirely: without it, we have no way of telling which of the 30 Celery process are the main process that needs to be sent a SIGTERM when we want it to do a warm shutdown. We also have no reliable way to check if the main process is still alive.
Martey
  • 1,631
  • 2
  • 13
  • 23
nitwit
  • 1,745
  • 2
  • 17
  • 20
  • Maybe my answer of http://stackoverflow.com/questions/9764913/how-do-i-restart-celery-workers-gracefully/16717128#16717128 helps you. – guettli May 23 '13 at 14:50

7 Answers7

5

celeryd has --autoreload option. If enabled, celery worker (main process) will detect changes in celery modules and restart all worker processes. In contrast to SIGHUP signal, autoreload restarts each process independently when the current executing task finishes. It means while one worker process is restarting the remaining processes can execute tasks.

http://celery.readthedocs.org/en/latest/userguide/workers.html#autoreloading

mher
  • 10,508
  • 2
  • 35
  • 27
3

I've recently fixed the bug with SIGHUP: https://github.com/celery/celery/pull/662

mikenerone
  • 1,937
  • 3
  • 15
  • 19
Ivan Virabyan
  • 1,666
  • 2
  • 19
  • 25
  • Thanks! However, your fix doesn't change the fact that SIGHUP waits for all tasks to finish before terminating and relaunching, again causing the delay I'm trying to avoid. Ideas on how to take advantage of your fix and yet make it relaunch without waiting would be great... – nitwit Jun 04 '12 at 18:02
  • This is how I solved the problem. I put every long running task (video conversion, email delivery) in a separate queue, which is processed by a separate worker. So when I send SIGHUP to all workers, I know that the worker processing tasks from default queue doesn't block for a long time, because there are only small tasks. The video conversion doesn't block small tasks. Only video conversion queue is blocked for a while. But this is acceptable in my case. – Ivan Virabyan Jun 05 '12 at 07:38
  • So after some testing, I found out your fix also fixes the SIGTERM problem. So I finally managed to solve this problem once and for all by merging your fix and restarting Celery using: `kill -s SIGTERM ``cat /var/run/celeryd.pid`` && python manage.py celeryd --pidfile=/var/run/celeryd.pid [...]` If you can put that in your answer, I'll accept it! – nitwit Jun 05 '12 at 07:45
  • I think it is unreliable beheviour. My patch has a little bug - it releases a pidlock too early (before all tasks are complete). As a result it allows starting a new process before the old is completely shutdown. This is completely unreliable. When merged to master branch, this was fixed. What you call a bug with SIGTERM is not really a bug. It's just a normal behaviour for every daemon. So I strongly recommend to NOT take advantage of the mistake in the patch, but to use the fixed version: https://github.com/ask/celery/commit/d3192eb5c1d9dcce21ea248c95df3783ccc161f2 – Ivan Virabyan Jun 05 '12 at 08:38
2
rm *.pyc

This causes the updated tasks to be reloaded. I discovered this trick recently, I just hope there are no nasty side effects.

Régis B.
  • 10,092
  • 6
  • 54
  • 90
1

Well you using SIGHUP (1) for warm shutdown of celery. I am not sure if it actually causes a warm shutdown. But SIGINT (2) would cause a warm shutdown. Try SIGINT in place of SIGHUP and then start celery manually in your script (I guess).

Debanshu Kundu
  • 785
  • 7
  • 18
0

A little late, but that can fixed by deleting the file called celerybeat.pid.

Worked for me.

spac3_monkey
  • 130
  • 1
  • 7
0

I think you can try this:

kill -s HUP ``cat /var/run/celeryd.pid`` 
python manage.py celeryd --pidfile=/var/run/celeryd.pid

HUP may recycle every free worker and leave executing workers keep running and HUP will let these workers be trusted. Then you can safely restart a new celery worker main process and workers. Old workers may be killed itself when task has been finished.

I've use this way in our production and it seems safe now. Hope this can help you!

0

Can you launch it with a custom pid file name. Possibly timestamped, and key off of that to know which PID to kill?

CELERYD_PID_FILE="/var/run/celery/%n_{timestamp}.pid"

^I dont know the timestamp syntax but maybe you do or you can find it?

then use the current system time to kill off any old pids and launch a new one?

j_mcnally
  • 6,928
  • 2
  • 31
  • 46
  • I suspect you'll need a combination of one of the techniques in the question with this. Depending on your broker, you should be able to start a new celery with a timestamp-based pidfile (using `--pidfile=`), then send `SIGTERM` to all the other running celery processes to get them to warm shutdown (although note that there should really only be one, unless you try this while an old celeryd is still going through warm shutdown). – James Aylett Mar 10 '12 at 19:03