3

Our Rails application has some very intensive background processes, sometimes taking several hours to run. We are using delayed_job, and would consider moving to Resque or the free version of Sidekiq, it made sense in this context of this question.

We are hitting 100% cpu on all processors for some of the jobs, and currently the background processors are on the same physical server as Nginx, Rails and Postgres. We are also expecting the load to rise.

We would like to move the background processing off to a pool of commodity-level batch processing VMs, and preferably spin them up as needed. The way I am thinking is to extract the perform code into mini-apps and put them onto the batch processing VMs.

What I am not sure about is how to code this, also how to load-balance the job queues across different VMs. Is this something that delayed_job/Reqsue/Sidekiq can do, or do I need to code it?

EDIT

Some useful links I have found on this topic

http://www.slideshare.net/kigster/12step-program-for-scaling-web-applications-on-postgresql

Use multiple Redis servers in Sidekiq

https://stackoverflow.com/a/19540427/993592

Community
  • 1
  • 1
port5432
  • 5,889
  • 10
  • 60
  • 97
  • 1
    Check this out https://github.com/railsmachine/moonshine, it provide easy way to manage cluster and have build in plugin to manage resque cluster with god (http://godrb.com/) – maximus ツ Mar 20 '15 at 11:34

2 Answers2

2

My personal preference is Sidekiq. I'd be a little concerned about "several hour" jobs and what happens if they fail in the middle. By default Sidekiq will try and re-run them. You can change that, but you definitely want to think through the the scenario. This of course will be true for whatever background job processing system you use though. IMHO I'd try to find a way to break those big jobs up into smaller jobs. Even if it's just "job part 1 runs then enqueues job part 2, etc".

As for scalability Sidekiq's only real limit is Redis. See here for some options on that: https://github.com/mperham/sidekiq/wiki/Sharding

As for load balancing, Sidekiq does it by default. I run two sidekiq servers now that pull from a single Redis instance. 25 workers on each with about 12 queues. Works amazingly well.

Philip Hallstrom
  • 19,673
  • 2
  • 42
  • 46
  • thank you for your answer. Does this mean you have the Rails application installed in all three servers? – port5432 Mar 20 '15 at 16:40
  • 1
    Yes. I suppose you wouldn't have to if you split things up right, but I do. – Philip Hallstrom Mar 20 '15 at 16:41
  • Are there any guides around on how to do this? I mean if we have a central database, and copies of the Rails server on multiple boxes, how do they know about each other? – port5432 Mar 20 '15 at 16:49
  • Not sure I understand. They app servers don't need to know about each other. They all communicate with the central database. – Philip Hallstrom Mar 20 '15 at 16:50
  • 1
    Sorry, I mean is there a master Sidekiq instance that controls the others? If we bring up a new one how to register it? – port5432 Mar 20 '15 at 16:53
  • Ah. Each Sidekiq is independent. No need to have a master. They all talk to Redis and know what to do. They don't need each other. That's the nice thing. Take one of the servers down and nothing changes except you lost some capacity. – Philip Hallstrom Mar 20 '15 at 16:54
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/73433/discussion-between-ardochhigh-and-philip-hallstrom). – port5432 Mar 20 '15 at 17:16
1

I've seen Sidekiq workers hang during network operations, eventually stopping all jobs from running, with no way of knowing until users complain.

ConeyIsland offers more control over job execution than Sidekiq does and also uses RabbitMQ for a message bus, which is more robust and has far superior scaling features to Redis.

You can set a per-queue and per-job timeouts, configure retry behavior, and a bad job will never cause the worker to hang: it will always continue working other jobs.

Exceptions in jobs are pushed to the notification service of your choice, so you will know when a job goes bad.

http://edraut.github.io/coney_island/

Eric
  • 11
  • 1