How do I troubleshoot an exit timeout of celeryd when running on Heroku (error R12)?

Question

I'm running celeryd on a Heroku dyno. When I shut it down and it has previously processed (even completed) at least one task, it doesn't shut down properly and I'm getting an error R12 (exit timeout) from Heroku.

Here's how I'm running celeryd from my Procfile (through Django and django-celery):

celeryd: python manage.py celeryd -E --loglevel=INFO

Here's what I'm doing to trigger it:

> heroku ps:scale web=0 celeryd=0 --app myapp

And here's the log output I'm getting:

2012-09-07T12:56:31+00:00 heroku[celeryd.1]: State changed from up to down
2012-09-07T12:56:31+00:00 heroku[api]: Scale to celeryd=0, web=1 by mail@mydomain.com
2012-09-07T12:56:32+00:00 heroku[web.1]: State changed from up to down
2012-09-07T12:56:32+00:00 heroku[api]: Scale to web=0 by mail@mydomain.com
2012-09-07T12:56:34+00:00 heroku[celeryd.1]: Stopping all processes with SIGTERM
2012-09-07T12:56:35+00:00 heroku[web.1]: Stopping all processes with SIGTERM
2012-09-07T12:56:37+00:00 heroku[web.1]: Process exited with status 143
2012-09-07T12:56:43+00:00 heroku[celeryd.1]: Error R12 (Exit timeout) -> At least one process failed to exit within 10 seconds of SIGTERM
2012-09-07T12:56:43+00:00 heroku[celeryd.1]: Stopping remaining processes with SIGKILL
2012-09-07T12:56:45+00:00 heroku[celeryd.1]: Process exited with status 137

Originally, I experienced this on celery 2.5.5. Now I upgraded to 3.0.9 and I still have the same problem.

As far as I can tell, my tasks have all completed. This error is reliably reproducible by running a single task on that celery dyno, giving it enough time to complete and then shutting the dyno down.

I don't know what else to check. Any idea how I can troubleshoot this? What could block celeryd from responding to Heroku's SIGTERM when the task has already completed?

@Murph I never figured it out, but I can no longer reproduce it either. However, I'm running quite a different configuration by now, with the latest celery and django-celery and a much more complex Procfile, with separate processes for multiple workers, cam and beat on the same dyno. — Henrik Heimbuerger, May 02 '13 at 14:01
I'm stuck on this issue as well, see https://github.com/yuvadm/heroku-periodical/issues/1 — Yuval Adam, Jun 23 '13 at 22:45
I'm still/again having this issue on celery 3.1.11. Not solved unfortunately. :( — Henrik Heimbuerger, Jul 30 '14 at 12:44

score 1 · Answer 1 · answered Jan 07 '14 at 04:49

1

I'm encountering the same issue. I'm not sure, but it may have been fixed:

Worker with -B argument did not properly shut down the beat instance.

So if you're using celery beat inside a worker instance, you might need to upgrade.

answered Jan 07 '14 at 04:49

Scott Coates

2,462
5
31
40

Indeed, that sounds good, thanks for the hint! Too bad they didn't note a corresponding ticket on their changelog. – Henrik Heimbuerger Jan 07 '14 at 12:17
1

Just stopped and restarted my workers and still get the same issue. It does seem to be an issue with Celerybeat though. In the logs I see all the workers shut down successfully but then `INFO/MainProcess beat: Shutting down...` and soon after `heroku[worker.1]: Error R12 (Exit timeout) -> At least one process failed to exit within 10 seconds of SIGTERM` – johnboiles Feb 20 '14 at 01:01
FYI, I've had the same issue when running Celerybeat on a separate dyno and when running it on the same one as the workers. – johnboiles Feb 20 '14 at 01:02

score -1 · Answer 2 · answered Jun 26 '13 at 20:36

-1

This sounds to me like celery isn't capturing the SIGTERM signal and reacting to it, waiting until the SIGKILL arrives.

This pull request might help you out: https://github.com/cybertoast/celery/commit/e9a007b982b0f9268174ae94b351a9275eaef4a3

answered Jun 26 '13 at 20:36

Neil Middleton

22,105
18
80
134

I don't quite understand how that's related. It looks like an init.d script for CentOS, how does that help me on Heroku? It also just seems to be trying to 'kill harder'. Heroku already does that for me, the question is why Celery does not respond to the SIGTERM properly in the first place. – Henrik Heimbuerger Sep 01 '13 at 09:30

How do I troubleshoot an exit timeout of celeryd when running on Heroku (error R12)?

2 Answers2