2

In one of our java applications (based on postgresql db), we have a database table that maintains a list of tasks to be executed. Each row has a json blob for the details of a task as well as scheduled time value.

We have a few java workers/threads whose jobs are to search for tasks that are ready for execution (based on its schedule value), execute and delete them from the table. Execution of a task may take a few seconds.

The problem is, more than one worker may grab the same row, causing duplicate execution of a task, which is something we want to avoid.

One approach is, when doing select to grab a row, do it with FOR UPDATE to lock the row, supposedly preventing other worker from grabbing the same row that's locked.

My concern with this approach is, the row is only locked when the select transaction is being executed in the db (according to this), while the java code is actually executing the row/task that's selected, the locking has gone, another worker can grab it again.

Can some shed some light on whether the above approach is going to work for sure? Thanks!

pktCoder
  • 1,105
  • 2
  • 15
  • 32

2 Answers2

0

Treat the DB calls as atomic instructions and design lock free algos around your table, using updates to change a boolean column "in-progress" from false to true. Could also just be a state int (0=avail, 1=inprogress, N=resultcode).

Make sure you have a partial index on state 0 (and possibly 1 to recover from crashes to find tasks in progress), so that the ...where state=0 remains selective and fast (on top of the scheduled time index of course).

Hope this helps.

user2023577
  • 1,752
  • 1
  • 12
  • 23
  • Thanks @user2023577 for the idea. I had something similar to this too. Right now, I need to get an answer on whether the approach mentioned in OP (doing a `select` with row locking, execute the tasks, remove the row) would work or not. That's the current implementation, but I suspect that it doesn't actually prevent duplicate task execution despite the row locking. – pktCoder Apr 29 '18 at 15:52
0

When one thread has successfully locked the row on a given connection, another one attempting to obtain a lock on the row on a different connection should fail. You should issue the select-for-update with some kind of no-wait clause to request immediate failure if the row is locked.

Now, this doesn't solve the query vs lock race, as a failed lock may interrupt a thread's execution. You can solve that by (in each execution):

  1. Select all records with new tasks (regardless of whether they're being processed or not)
  2. For each new task returned in [1], run a matching select-for-update, then continue with processing the task if the lock fails.
  3. If any lock attempt fails, skip the task without failing the entire process.
ernest_k
  • 44,416
  • 5
  • 53
  • 99
  • Thanks @ernest-kiwele for the answer. Since `select` typically take a 1 ms and execution of tasks takes a few seconds. I wonder after worker 1 finishes running the `select` query and get a row, while it is executing the task (represented by the row) in java space (making api calls to other services etc), Is the row still considered locked, hence prevents other worker from grabbing it? Thanks. – pktCoder Apr 29 '18 at 15:48
  • @pktCoder Then you may need to check the default behavior of the Postgresql JDBC driver. Depending on the defaults, you may need to explicitly set the transaction isolation level: `Connection.setTransactionIsolation (level)` to allow other connections to see the record. But it seems that you can read it, according to this: https://stackoverflow.com/questions/12029470/can-a-locked-row-in-postgres-still-be-read-from – ernest_k Apr 29 '18 at 15:54
  • the SO question you referred to basically points to the postgresql doc I referred to in OP. In that document (most authoritative), it mentions that the lock on the rows will be in effect until the end of transaction. The question is, does the transaction simply means `select ... FOR update ...`? In that case, it's a fraction of a milisecond and it doesn't provide the protection for the next few seconds when the worker is actually executing the task, which may last for a few seconds. – pktCoder Apr 29 '18 at 16:04
  • @pktCoder The transaction is **the time from `select for update` until an a commit or a roll back is done**. So you will run select-for-update, execute the task, then commit, which will prevent other connections from processing the same row, as long as the tasks run the same, consistent code. – ernest_k Apr 29 '18 at 16:09
  • all the workers are run the same, consistent code. So, if I understand it correctly, when worker1 has finished `select` query, in the middle of executing the task, worker2 tries to do the same query, worker2's query will run as if the rows that are selected by worker1 don't exist. Is this right? – pktCoder Apr 29 '18 at 16:18