delayed_job: One job per tenant at a time?

Question

I have a multitenant-Rails app with multiple delayed_job workers.

In order to avoid overlapping tenant-specific work, I would like to separate the workers from each other in such a way that each one works on only one tenant-specific task at a time.

I thought about using the (named) queue column and add "tenant_1", "tenant_2" and so on. Unfortunately the queues have to be named during configuration, so this principle is not flexible enough for many tenants.

Is there a way to customize the way delayed_job picks the next task? Is there another way to define a scope?

Wouldn't that mean you need to fire up at least 1 worker per tenant in your system? That doesn't sound very scalable unless each tenant also gets their own server. Each worker holds the entire Rails app in memory. — Unixmonkey, Mar 11 '19 at 23:28
@Unixmonkey only the running workers consumes memory, right? not the ones scheduled. — Raj, Mar 12 '19 at 07:12
I try to get a pool of workers working only on different tenants at a time. — Railsana, Mar 12 '19 at 07:50
Scheduling a job just means serializing the data needed to run the job and adding it to the `delayed_jobs` table, you still generally have a pool of workers idling waiting to pick up jobs and process them. I could perhaps see this working if you fired up a worker at the same time you scheduled the job, and had it kill itself on completion, but that's still a lot of overhead, and retry logic you'd likely have to re-implement yourself. — Unixmonkey, Mar 12 '19 at 13:58

score 2 · Accepted Answer · answered Mar 14 '19 at 21:17

Your best bet is probably to spin a custom solution that implements a distributed lock - essentially, the workers all run normally and pull from the usual queues, but before performing work check with another system (Redis, RDBMS, API, whatever) to verify that no other worker is yet performing a job for that tenant. If that tenant is not being worked, then set the lock for the tenant in question and work the job. If the tenant is locked, don't perform the work. It's your call on a lot of the implementation details like whether to move on to try another job, re-enqueue the job at the back of the queue, whether to consider it a failure and bind it to your retry limits, or do something else entirely. This is pretty open-ended, so I'll leave the details to you, but here are some tips:

Inheritance will be your friend; define this behavior on a base job and inherit from it on the jobs you expect your workers to run. This also allows you to customize the behavior if you have "special" cases for certain jobs that come up without breaking everything else.
Assuming you're not running through ActiveJob (since it wasn't mentioned), read up on delayed_job hooks: https://github.com/collectiveidea/delayed_job/#hooks - they may be an appropriate and/or useful tool
Get familiar with some of the differences and tradeoffs in Pessimistic and Optimistic locking strategies - this answer is a good starting point: Optimistic vs. Pessimistic locking
Read up on general practices surrounding the concept of distributed locks so you can choose the best tools and strategies for yourself (it doesn't have to be a crazy complicated solution, a simple table in the database that stores the tenant identifier is sufficient, but you'll want to consider the failure cases - how to you manage locks that are abandoned, for example)

Seriously consider not doing this; is it really strictly required for the system to operate properly? If so, it's probably indicative in an underlying flaw in your data model or how you've structured transformations around that data. Strive for ACIDity in your application when thinking about operations on the data and you can avoid a lot of these problems. There's a reason it's not a commonly available "out of the box" feature on background job runners. If there is an underlying flaw, it won't just bite you on this problem but on something else - guaranteed!

Thank you! I wasn't aware of the delayed_job hooks which work great! The flaw in my app is that a API I'm using is only allowing one connection per tenant. — Railsana, Mar 19 '19 at 19:04

score 1 · Answer 2 · answered Mar 15 '19 at 12:33

If you are trying to avoid two different workers working on the same tenant then that's a bad design choice. something is smelling. fix that first. however, if you want the same kind of worker instances working on different tenents below is the easiest solution. These relationships are my hypotheses.

ExpiredOrderCleaner = Struct.new(:tenant_id) do
  def perform
    Order.where(tenant_id: tenant_id).expired.delete_all
  end
end

Tenant.each do |tenant|
  Delayed::Job.enqueue ExpiredOrderCleaner.new(tenant.id)
end

this will create unique jobs for each tenant. single worker instance will work on a specific tenant. however, there can be other kinds of jobs working on the same tenant. which is good as it should be. if you need to more smaller scope, just pass more arguments for the worker and use in the query and use database transactions to avoid collisions.

these best practices are true for any background worker.

Make your job idempotent and transactional means that your job can safely execute multiple times
Embrace Concurrency design your jobs so you can run lots of them in parallel

your work will be a lot easier if you use apartment gem and active job wrappers. see the examples from there documents.

delayed_job: One job per tenant at a time?

2 Answers2