2

I'm interested in using Celery for an app I'm working on. It all seems pretty straight forward, but I'm a little confused about what I need to do if I have multiple load balanced application servers. All of the documentation assumes that the broker will be on the same server as the application. Currently, all of my application servers sit behind an Amazon ELB and tasks need to be able to come from any one of them.

This is what I assume I need to do:

  • Run a broker server on a separate instance
  • Configure each application instance to connect to that broker server
  • Each application instance will also be be a celery working (running celeryd)?

My only beef with that is: What happens if my broker instance dies? Can I run 2 broker instances some how so I'm safe if one goes under?

Any tips or information on what to do in a setup like mine would be greatly appreciated. I'm sure I'm missing something or not understanding something.

Lyle Pratt
  • 5,636
  • 4
  • 27
  • 28

3 Answers3

3

For future reference, for those who do prefer to stick with RabbitMQ...

You can create a RabbitMQ cluster from 2 or more instances. Add those instances to your ELB and point your celeryd workers at the ELB. Just make sure you connect the right ports and you should be all set. Don't forget to allow your RabbitMQ machines to talk among themselves to run the cluster. This works very well for me in production.

One exception here: if you need to schedule tasks, you need a celerybeat process. For some reason, I wasn't able to connect the celerybeat to the ELB and had to connect it to one of the instances directly. I opened an issue about it and it is supposed to be resolved (didn't test it yet). Keep in mind that celerybeat by itself can only exist once, so that's already a single point of failure.

zvikico
  • 9,765
  • 4
  • 38
  • 49
  • How did you configure so that the ELB doesn't kill off the connection after 60 seconds? [2013-07-17 11:03:40,395: ERROR/MainProcess] consumer: Cannot connect to amqp://usr@elburl:5672/vhost: Socket closed. Trying again in 2.00 seconds... – moodh Jul 17 '13 at 11:06
  • 2
    I'm using a TCP health check, not HTTP. Using the rabbimq port. Works well for me. – zvikico Jul 17 '13 at 11:17
  • I switched to TCP health check against the rabbitmq port. My celery worker still times out at exactly 1 minute when the broker_url is set against the ELB. How did you configure celery? I tried using BROKER_HEARTBEAT but no luck there either. :/ – moodh Jul 17 '13 at 11:20
  • 1
    Using BROKER_HOST that points to the ELB public DNS name. I should point out that I'm using Celery 2.4.x, haven't tested it with 3.x. – zvikico Jul 17 '13 at 12:05
  • Yeah that's what I'm doing, but with 3.0. I noticed BROKER_HEARTBEAT is active under 3.x, maybe that has something to do with it.. – moodh Jul 17 '13 at 12:14
  • Seems I messed up with the listeners and had port 5672 on http instead of tcp, switched that and it seems to work. Thanks anyway! :D – moodh Jul 17 '13 at 13:51
  • I'm glad it's sorted out. Keep in mind that the beat machine is a single point of failure. If it plays a critical part in your architecture, you need to do something about it. – zvikico Jul 18 '13 at 07:01
  • I don't use celerybeat. BROKER_HEARTBEAT simply polls the connection each X seconds. It's default in 2.4x but not in 3.0. Thats what caused my timeouts. =) I simply reactivated it and everything works, no single points of failure. – moodh Jul 18 '13 at 07:07
  • The Celery Beat is the task scheduler, not the heartbeat. If you use periodic tasks, you still need the Celery Beat and you can still have just one of those running. http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html – zvikico Jul 18 '13 at 08:16
1

You are correct in all points.

How to make reliable broker: make clustered rabbitmq installation, as described here: http://www.rabbitmq.com/clustering.html

  • Thanks for your answer and the link to the clustering information! However, I decided to use Amazon's SQS service for my broker, instead of dealing with running my own rabbitmq cluster. Turns out Celery has built in SQS support. See this SO question: http://stackoverflow.com/questions/8048556/celery-with-amazon-sqs – Lyle Pratt Apr 14 '12 at 17:29
0

Celery beat also doesn't have to be a single point of failure if you run it on every worker node with:

https://github.com/ybrs/single-beat

curran736
  • 31
  • 3