15

I have an java application reading from a database table for jobs to process, and I may have multiple instances of this application running on different servers as each job is independent. Once a job is picked up for processing, its status will be update to "running". What I want to make sure is the retrieval of to be processed jobs from each instance to be atomic, how can I achieve this using JDBC?

David Zhao
  • 4,284
  • 11
  • 46
  • 60
  • 1
    It is atomic. What you more probably want is for the row to be locked while you process and update it. For that you need SELECT ... FOR UPDATE. – user207421 Aug 29 '16 at 21:05

6 Answers6

6

One approach that would be completely generic*, though perhaps slightly inefficient, would be to use a server-specific identifier to "claim" a job by first updating its status to that identifier, then retrieve the job based on that value. For example, if you were working with Windows servers on the same network then their server name would uniquely identify them. If your table looked like

JobID  JobName  Status
-----  -------  ---------
    1  Job_A    Completed
    2  Job_B
    3  Job_C

where unclaimed jobs have a Status of NULL then your application running on SERVER1 could claim a job by doing setAutoCommit(true) followed by

UPDATE Jobs SET Status='SERVER1'
WHERE JobID IN (
    SELECT TOP 1 JobID FROM Jobs 
    WHERE Status IS NULL
    ORDER BY JobID)

If ExecuteUpdate returns 0 then there are no jobs pending. If it returns 1 then you can get the row with

SELECT JobID, ... FROM Jobs WHERE Status='SERVER1'

and then update its Status to 'Running' with a parameterized query like

UPDATE Jobs SET Status='Running' WHERE JobID=?

where you supply the JobID you retrieved from the previous SELECT.

*(i.e., not relying on any specific SQL extensions, explicit locking, or transaction handling)

Gord Thompson
  • 116,920
  • 32
  • 215
  • 418
2

Lock the table using whatever mechanism is supported by your database server.

For example, in Postgres it would be:

LOCK yourtable;

And it's your table for the duration of the transaction.

Other databases will have something similar.

Will Hartung
  • 115,893
  • 19
  • 128
  • 203
  • 1
    He asked about best practice, not worst. – user207421 Aug 29 '16 at 21:06
  • @EJP I really laughed at your comment! :) But then I realized that you may have been too harsh. Will provided a sufficient solution. Best and worst cannot be decided when information is missing. How long does a job take on average? If each job takes a minute or two (or even 2 or 3 seconds), supposing that there are enough machines to handle the load, an application instance could shamelessly lock the `jobs` table in order to atomically claim a job for execution. There would be no bottleneck and the code would be as simple as it gets, right? That would qualify as close-to-best in my book. – xnakos Aug 30 '16 at 12:28
  • 1
    @EJP Could you elaborate on why this is the worst solution? The OP specifically requested the method in which to keep the queries atomic? How would you suggest doing this at the database level without a lock? – slambeth Aug 31 '16 at 17:52
1

Use ResultSet that has CONCUR_READ_ONLY and TYPE_FORWARD_ONLY. If your database jdbc driver supports it, it will only return atomic read of your select time.

According to this documentation, (Table Summary of Visibility of Internal and External Changes)

forward-only cursor will only show your read time results. CONCUR_READ_ONLY will prevent your internal updates.

Atilla Ozgur
  • 14,339
  • 3
  • 49
  • 69
1

When using databases of a transactional nature, one popular practice is to perform ROW-LEVEL LOCKING. Row-level locks prevent multiple transactions from modifying the same row. SELECT for UPDATE is an easy way to achieve this effect. Assuming you have a processes table:

SELECT process_id, status 
from processes
for UPDATE of status SKIP LOCKED;

When done processing, issue

update processes set status = 'updated'
where process_id = :process_id;             --from before

Issue

commit;

to release the lock.

Here's an actual example

Disclaimer: SELECT FOR UPDATE is a form of pessimistic locking and has its caveats as explained by Burleson. However, it might be a viable solution if the client is not web-based and extremely concurrent.

Spade
  • 2,220
  • 1
  • 19
  • 29
  • 1
    Just a quick note, please correct me if I am wrong. `SKIP LOCKED` exists in PostgreSQL (from 9.5 and on), for instance, but not in MySQL. I feel that this very feature (`SKIP LOCKED`) is necessary for concurrency. Or else, another `select ... for update` would get stuck on the row/job lock, only to find out (after getting unstuck) that the row/job is `running`, thus moving on to select the next row/job (for update). In other words, `select ... for update` in MySQL would not be concurrent. Please correct me, if I am wrong. It is nice to finally see `select ... for update` mentioned. – xnakos Sep 01 '16 at 12:02
  • 1
    Yes MySQL does not (yet) offer SKIP LOCKED or NOWAIT options and it wouldn't be able to offer true concurrency using `SELECT... FOR UPDATE`. There is [some work in progress](https://github.com/mysql/mysql-server/compare/5.7...koppenheim:5.7-skip_locked) on this front, but it has been an [open issue](https://dev.mysql.com/worklog/task/?id=3597) for a long time. OP hasn't indicated his database but if using MySQL and focusing on collision-avoidance as well as concurrency equally, I would redirect to a [optimistic locking strategy](http://stackoverflow.com/a/18806907) – Spade Sep 01 '16 at 19:50
1

Problem

Take jobs ready to process and make their status running atomically.

Solution

No need for additional locks. Since an update operation is already atomic by itself in terms of the same query (see the excerpt from the docs below), update the jobs table, setting the status running to those that are ready to be processed and get the result of this update - it will be the jobs you took for processing.

Examples:

Postgres

UPDATE jobs SET status = 'running'
  WHERE status is NULL
RETURNING id;

In terms of JDBC you can go similar to this:

String sql = "update ... returning ...";
boolean hasResult = statement.execute(sql);
if (hasResult) {
    ResultSet rs = statement.getResult();
}

SQL Server

UPDATE jobs SET status = 'running'
  WHERE status is NULL
OUTPUT UPDATED.id;

Excerpt from the Postgres documentation that shows how 2 transactions behave when doing UPDATE on the same table with the same query:

UPDATE will only find target rows that were committed as of the command start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress).

Artem Novikov
  • 4,087
  • 1
  • 27
  • 33
0

if you want to ensure proper work in concurrent environment in your specific example you can use the server name.

The table will look like:

JobID  JobName  Server  Status
-----  -------  ------- ---------
    1  Job_A    host-1  Completed
    2  Job_A    host-2  Working
    3  Job_B    host-3  Working

if you have multiple instances on the same host add the process id too:

JobID  JobName  Server  ProcessID  Status
-----  -------  ------- ---------- ---------
    1  Job_A    host-1  1000       Completed
    2  Job_A    host-2  1000       Working
    3  Job_A    host-2  1001       Working
    5  Job_B    host-3  1000       Working
Deian
  • 1,237
  • 15
  • 31