5

I want to :

  1. Select N rows from a table for processing where flag=0
  2. Do some work on a second table using values from these N rows
  3. Update these N rows and set flag=1

I have parallel processes doing this same work together, and i want to ensure that all get to work on unique rows. How do i ensure that?

newbie
  • 1,485
  • 2
  • 18
  • 43

2 Answers2

5

I assume you are running on SQL Server (because of the tag), if not then my answer is not applicable. Locking alone is not enough. If you use database record locking SqL server will block other processes trying to access the locked row and in effect you will handle only one row at a time. The solution for you is to combine row locking with READPAST hint so the rows locked by someone else will be skipped. Here's what each process should do:

  1. select next unlocked row for processing and lock it
  2. do the work
  3. update the row and end transaction

select top 1 id, ... from TheTable with (updlock, readpast) where flag = 0

//do the work now

update TheTable set flag = 1 where id=<previously retrieved id>

The nice thing here that the operation of selecting the next unlocked row and locking it is atomic so it guarantees that no one else will be able to select the same row.

nightwatch
  • 1,276
  • 2
  • 18
  • 27
  • READPAST hint does the trick, a good article http://www.mssqltips.com/sqlservertip/1257/processing-data-queues-in-sql-server-with-readpast-and-updlock/ – newbie Nov 02 '11 at 17:17
  • With SQL 2008, you can use the OUTPUT clause with the READPAST HINT in the query to combine the queue operation into a single statement.http://www.sqlservercentral.com/articles/Queue+processing/69653/ – newbie Nov 02 '11 at 17:20
  • @newbie Yes, you can use output to lock and retrieve messages and mark it processed in a single query. But I've been comparing the performance for single message processing and there's no gain compared to two queries (select first and then update). Actually the multi-threaded throughput was a little lower when using the OUTPUT query. Maybe in batch processing the Output clause would be faster, but for single messages it's not. – nightwatch Nov 11 '11 at 10:16
0

One way is to have a master program hand out segments to the child threads.

Another way is to lock the table, get CEIL(N/#processes) rows where flag = 0, update the flag to 2, then release the lock. Then the next process will continue since it got the lock, and since flag = 2 it won't get those rows.

You have two ways to lock the table - you can either lock the whole thing, or do SELECT ... FOR UPDATE with a limit (to not get too many rows). See: SELECT FOR UPDATE with SQL Server

Even better than setting the flag to 2 is set the flag to the process_id. Then all you have to do is update all the rows to distribute numbers, then let the process go to work, each checking only their own rows.

Community
  • 1
  • 1
Ariel
  • 25,995
  • 5
  • 59
  • 69