We have a job that we'd like to run on two separate servers at intervals. This job processes an email queue by essentially converting queue records into SMTP messages once per minute. It works great on a single server, but we'd like to avoid a single-point-of-failure and have the job run alternatively on each of two (or more) servers.
It would be best of the jobs were not running simultaneously, so I'd like for one of them to be able to detect the other one running and stop without doing any work. Basically, they shouldn't step on each other's toes.
This is somewhat similar to How to Prevent Sql Server Jobs to Run simultaneously except for two details:
- I'm using MariaDB, with the InnoDB engine
- My code is written in Java
My first instinct is to use the database similar to how the above answer suggests. In MariaDB, I might create a "locks" table for this purpose and use it something like this:
CREATE TABLE locks (id VARCHAR 10, PRIMARY KEY id);
INSERT INTO locks (id) VALUES ('email');
BEGIN;
SELECT id FROM locks WHERE id='email' FOR UPDATE;
[do my work, here, in another connection/transaction]
ROLLBACK; // or COMMIT
My expectation is that the FOR UPDATE will cause the lock record to be locked until I either COMMIT
or ROLLBACK
that transaction. That means that another process attempting to execute the same code should block waiting for the lock. If I add a timeout on that SELECT
query, I can detect a timeout and simply stop the "second" process without continuing with the "real work".
Will this scheme work the way I'm expecting? Specifically:
- Does the
SELECT ... FOR UPDATE
in a transaction really hold the lock until the transaction is complete. I believe the answer to this is yes -- well, I hope so, because I have lots of other code that relies on this kind of technique. - Will the
SELECT ... FOR UPDATE
only lock a single row, or is it possible that the database will lock more than that one row? I'd like to be able to use this technique on many different jobs, only one of this is the "email" job. I can create separate tables for each job-type if necessary. - Will this technique work if we are using a clustered database such as Galera? My understanding is that transactional locks in Galera only lock locally and, upon commit, the transaction will be distributed across the cluster. Since we have side-effects outside of the database (SMTP messages), we cannot guarantee that we are avoiding sending the same message twice.
We aren't using a clustered database quite yet, but we expect to be moving that direction, so a solution that can operate in that kind of environment as well would be preferred.