6

I have one table that is read at the same time by different threads.

Each thread must select 100 rows, execute some tasks on each row (unrelated to the database) then they must delete the selected row from the table.

rows are selected using this query:

SELECT id FROM table_name FOR UPDATE;

My question is: How can I ignore (or skip) rows that were previously locked using a select statement in MySQL ?

Hubert Perron
  • 3,022
  • 5
  • 34
  • 28

2 Answers2

15

I typically create a process_id column that is default NULL and then have each thread use a unique identifier to do the following:

UPDATE table_name SET process_id = #{process.id} WHERE process_id IS NULL LIMIT 100;

SELECT id FROM table_name WHERE process_id = #{process.id} FOR UPDATE;

That ensures that each thread selects a unique set of rows from the table.

Hope this helps.

Ani Menon
  • 27,209
  • 16
  • 105
  • 126
Kris Robison
  • 698
  • 7
  • 13
  • 1
    In this example, if the script abruptly aborted after the first update, how would you "clean" the database from old processes? Otherwise those 100 rows would never be taken care of. – timetofly Dec 08 '13 at 00:42
  • 1
    Good question. I usually set either the same column or a 'state' column to indicate successful completion (as part of the transaction). Then if a process dies, you can set any process_id for a non-existing process back to NULL for another process to pick up. I prefer cron jobs for the cleanup, but you could also make it part of starting a new process. – Kris Robison Dec 11 '13 at 22:20
  • 1
    Another method would be to set `locked_at` to the time, well, you've locked the record at. Then you would modify the `UPDATE` statement to update records where either `process_id` is `NULL` or `locked_at` is earlier than some reasonable interval. – shock_one Jan 26 '15 at 20:37
  • @KrisRobison how u create processId to processId be unique ? time()+randstring()? ! – babak faghihian May 09 '16 at 14:49
  • throw a host name/id i.e. namespace in for good measure, but yeah that would work for a fully distributed lock id. – Kris Robison May 10 '16 at 23:35
0

Even though it is not the best solution, as there is no way that I know to ignore locked rows, I select a random one and try to obtain a lock.

START TRANSACTION;
SET @v1 =(SELECT myId FROM tests.table WHERE status is NULL LIMIT 1);
SELECT * FROM tests.table WHERE myId=@v1 FOR UPDATE; #<- lock

Setting a small timeout for the transaction, if that row is locked the transaction is aborted and I try another one. If I obtain the lock, I process it. If (bad luck) that row was locked, it is processed and the lock is released before my timeout, I then select a row that has already been 'processed'! However, I check a field that my processes set (e.g. status): if the other process transaction ended OK, that field tells me that work has already been done and I do not process that row again.

Every other possible solution without transactions (e.g. setting another field if the row has no status and ... etc.) can easily provide race conditions and missed processes (e.g. one thread abruptly dies, the allocated data is still tagged, while a transaction expires; ref. comment here

Hope it helps

Filippo Mazza
  • 4,339
  • 4
  • 22
  • 25