0

We have a job that we'd like to run on two separate servers at intervals. This job processes an email queue by essentially converting queue records into SMTP messages once per minute. It works great on a single server, but we'd like to avoid a single-point-of-failure and have the job run alternatively on each of two (or more) servers.

It would be best of the jobs were not running simultaneously, so I'd like for one of them to be able to detect the other one running and stop without doing any work. Basically, they shouldn't step on each other's toes.

This is somewhat similar to How to Prevent Sql Server Jobs to Run simultaneously except for two details:

  1. I'm using MariaDB, with the InnoDB engine
  2. My code is written in Java

My first instinct is to use the database similar to how the above answer suggests. In MariaDB, I might create a "locks" table for this purpose and use it something like this:

CREATE TABLE locks (id VARCHAR 10, PRIMARY KEY id);
INSERT INTO locks (id) VALUES ('email');

BEGIN;
SELECT id FROM locks WHERE id='email' FOR UPDATE;

[do my work, here, in another connection/transaction]

ROLLBACK; // or COMMIT

My expectation is that the FOR UPDATE will cause the lock record to be locked until I either COMMIT or ROLLBACK that transaction. That means that another process attempting to execute the same code should block waiting for the lock. If I add a timeout on that SELECT query, I can detect a timeout and simply stop the "second" process without continuing with the "real work".

Will this scheme work the way I'm expecting? Specifically:

  1. Does the SELECT ... FOR UPDATE in a transaction really hold the lock until the transaction is complete. I believe the answer to this is yes -- well, I hope so, because I have lots of other code that relies on this kind of technique.
  2. Will the SELECT ... FOR UPDATE only lock a single row, or is it possible that the database will lock more than that one row? I'd like to be able to use this technique on many different jobs, only one of this is the "email" job. I can create separate tables for each job-type if necessary.
  3. Will this technique work if we are using a clustered database such as Galera? My understanding is that transactional locks in Galera only lock locally and, upon commit, the transaction will be distributed across the cluster. Since we have side-effects outside of the database (SMTP messages), we cannot guarantee that we are avoiding sending the same message twice.

We aren't using a clustered database quite yet, but we expect to be moving that direction, so a solution that can operate in that kind of environment as well would be preferred.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Christopher Schultz
  • 20,221
  • 9
  • 60
  • 77
  • Another approach is to have a `status` column in the LOCKS table with a default value such as PENDING. So SELECT id FROM locks WHERE id='email' **AND STATUS = 'PENDING'** FOR UPDATE. Then immediately UPDATE those selected records status to something like WORKING, and commit. Then update status to something like COMPLETED when that record is processed. This allows multiple servers to run jobs concurrently and is easy to recover just by resetting status back to PENDING. – Andrew S Jun 12 '19 at 15:30
  • Does that add anything to the scheme outlined above? The biggest problem I can see is getting something that will work in a clustered-database environment. What I need is something that will *fail to commit* unless the node has successfully claimed the lock. – Christopher Schultz Jun 12 '19 at 18:24
  • That is probably going to be database specific - if the database supports the FOR UPDATE. You would probably need to do a simple experiment to ensure the FOR UPDATE is supported for all the nodes across the cluster. – Andrew S Jun 12 '19 at 19:36

0 Answers0