Multi-tenant resque, and avoiding one tenant clogging the queue

Question

We have a multi-tenant app that runs resque for background processing.

The issue we occasionally run into is when a single tenant performs a lot of background work within a very short period of time. This essentially clogs up the queue for a while -- every other tenant's jobs are delayed while we work through the backlog for this single tenant.

Yes, we can add more workers. But that's not really a "solution", it's more a band-aid that still results in a delay for other tenants -- just a shorter delay as we process faster.

Is there more multi-tenant friendly way to use resque? Or a more multi-tenant friendly background queue entirely?

We've looking at either:

using a queue per tenant, and a worker per tenant (dynamically created queues?)
modifying resque so that it somehow round-robins through a queue per tenant

We're just wondering if there's something we're missing / a better way...

bbozo · Answer 1 · 2015-11-24T19:54:37.727

You could use your Rails.cache to maintain temporary job counters for each participant and assign the job to different queues depending on the number of active jobs.

You would need to subclass your jobs to support different queues and write a method that resolves to the correct class for the job. Something like:

class Worker

   cattr_acessor :tenant_id

   class Worker::Low < Worker
     @queue = :low
   end

   class Worker::High < Worker
     @queue = :high
   end

   def self.queued
      "#{name}::#{resolved_queue(tenant_id)}".constantize
   end

   def self.resolved_queue tenant_id
     count = job_count(tenant_id)
     if count > 1000
       'Low'
     else
       'High'
     end
   end

   def self.cache_key tenant_id
     "job_count/#{tenant_id}"
   end

   def self.job_count tenant_id
     Rails.cache.fetch(cache_key(tenant_id)){0}
   end

   def self.job_count_increment tenant_id
     Rails.cache.fetch(cache_key(tenant_id)){0}
     Rails.increment(cache_key(tenant_id)){0}
   end

   def self.job_count_decrement tenant_id
     count = Rails.cache.fetch(cache_key(tenant_id)){0}
     Rails.decrement(cache_key(tenant_id)){0} if count > 0
   end
end

and then call Worker.queued(tenant_id).perform when you are running the workers and make sure that Worker.tenant_id is set on before_filters in the application. See Resque Priorities and Queue Lists for more information on queues and priorities.

You should call increment on job queue and decrement from within the job.

Ugly, but workable.

And can be made more dry by some metaprogramming - to extract these methods into a module and then make sure the queue subclasses are generated on module include.

The only issue with this is that it potentially pushes things into the queue out-of-order, right? So if someone goes over that 1000 limit, then we start pushing stuff into the low queue. But once it processes it down to the 999 level, then we start pushing stuff into the medium queue again. So potentially we could have stuff that executed AFTER the initial large batch, but gets processed BEFORE the tail end of the large batch. Correct? — Keith Palmer Jr., Nov 30 '15 at 19:08
Potentially yes, but not likely as last jobs in medium will probably be processed after the first jobs in low — bbozo, Dec 01 '15 at 11:42

Multi-tenant resque, and avoiding one tenant clogging the queue

1 Answers1