43

Given a table that is acting as a queue, how can I best configure the table/queries so that multiple clients process from the queue concurrently?

For example, the table below indicates a command that a worker must process. When the worker is done, it will set the processed value to true.

| ID | COMMAND | PROCESSED |
|  1 | ...     | true      |
|  2 | ...     | false     |
|  3 | ...     | false     |

The clients might obtain one command to work on like so:

select top 1 COMMAND 
from EXAMPLE_TABLE 
with (UPDLOCK, ROWLOCK) 
where PROCESSED=false;

However, if there are multiple workers, each tries to get the row with ID=2. Only the first will get the pessimistic lock, the rest will wait. Then one of them will get row 3, etc.

What query/configuration would allow each worker client to get a different row each and work on them concurrently?

EDIT:

Several answers suggest variations on using the table itself to record an in-process state. I thought that this would not be possible within a single transaction. (i.e., what's the point of updating the state if no other worker will see it until the txn is committed?) Perhaps the suggestion is:

# start transaction
update to 'processing'
# end transaction
# start transaction
process the command
update to 'processed'
# end transaction

Is this the way people usually approach this problem? It seems to me that the problem would be better handled by the DB, if possible.

Synesso
  • 37,610
  • 35
  • 136
  • 207
  • 3
    please point me to the original, as SO didn't recommend a dupe. – Synesso Sep 04 '10 at 10:03
  • 2
    Why go through all the trouble of re-creating all this functionality when it's already available to any Windows server in the form of Microsoft Message Queueing (MSMQ) ?? Use what's available - don't constantly re-invent the wheel ! – marc_s Sep 04 '10 at 11:10
  • 3
    I disagree with marc_s. MSMQ introduces major complexity for sysadmins and developers alike. Million dollar projects have failed because they used MSMQ but could not deal with its complexity. You should not introduce MSMQ without a compelling reason. – Andomar Sep 04 '10 at 14:10
  • 9
    @marc_s: separating your queuing in a different Transaction Manager implies two-phase-commit on every operation to coordinate SQL-MSMQ (ie. tens/lower hundreds operations per sec. vs. tens of thousands in single phase commit). Having your messages in MSMQ and your state in the database means *you cannot possibly take a consistent backup*. With MSMQ you loose queriability of the messages. Finally, MSMQ has a 2Gb limit *per store* which is very small today, you can *easily* run out of the MSMQ store space, at which moment it just rolls over and dies. – Remus Rusanu Sep 04 '10 at 17:37
  • @Remus Rusanu: thanks for those interesting insights - those are indeed quite a few serious drawbacks for MSMQ ..... – marc_s Sep 04 '10 at 18:12
  • Thanks. This is a vendor system. It already uses the DB as a queue, I'm just trying to make it concurrent. We also already have IBM MQSeries in the app, but can't use it for processing these messages due to the existing system design. – Synesso Sep 05 '10 at 09:27

7 Answers7

70

I recommend you go over Using tables as Queues. Properly implemented queues can handle thousands of concurrent users and service as high as 1/2 Million enqueue/dequeue operations per minute. Until SQL Server 2005 the solution was cumbersome and involved a mixing a SELECT and an UPDATE in a single transaction and give just the right mix of lock hints, as in the article linked by gbn. Luckly since SQL Server 2005 with the advent of the OUTPUT clause, a much more elegant solution is available, and now MSDN recommends using the OUTPUT clause:

You can use OUTPUT in applications that use tables as queues, or to hold intermediate result sets. That is, the application is constantly adding or removing rows from the table

Basically there are 3 parts of the puzzle you need to get right in order for this to work in a highly concurrent manner:

  1. You need to dequeue automically. You have to find the row, skip any locked rows, and mark it as 'dequeued' in a single, atomic operation, and this is where the OUTPUT clause comes into play:
    with CTE as (
      SELECT TOP(1) COMMAND, PROCESSED
      FROM TABLE WITH (READPAST)
      WHERE PROCESSED = 0)
    UPDATE CTE
      SET PROCESSED = 1
      OUTPUT INSERTED.*;
  1. You must structure your table with the leftmost clustered index key on the PROCESSED column. If the ID was used a primary key, then move it as the second column in the clustered key. The debate whether to keep a non-clustered key on the ID column is open, but I strongly favor not having any secondary non-clustered indexes over queues:
    CREATE CLUSTERED INDEX cdxTable on TABLE(PROCESSED, ID);
  1. You must not query this table by any other means but by Dequeue. Trying to do Peek operations or trying to use the table both as a Queue and as a store will very likely lead to deadlocks and will slow down throughput dramatically.

The combination of atomic dequeue, READPAST hint at searching elements to dequeue and leftmost key on the clustered index based on the processing bit ensure a very high throughput under a highly concurrent load.

Dale K
  • 25,246
  • 15
  • 42
  • 71
Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
  • That ["Using tables as Queues"](http://rusanu.com/2010/03/26/using-tables-as-queues/) article is pure gold for those of us that have to implement queues not suitable for Service Broker (e.g. a pending queue). Thanks mang. – Nick Chammas Nov 01 '11 at 21:36
  • How to reschedule pending message to process later when message_handler wants to Rollback (within the same transaction)? – dariol Jul 20 '12 at 20:05
  • @dario-g: you cannot intentionally rollback in message processing. – Remus Rusanu Jul 21 '12 at 06:24
  • Thanks for reply. It means that I should catch that situation inside handler and tell explicit to the queue that it should Reschedule my message? – dariol Jul 21 '12 at 21:17
  • @dario-g: If you rolled back the SSB will try again right away, no need to reschedule. If your application has some dependency that blocks it from processing the message and it requires to retry later, not right away, the is the application responsibility schedule such processing. Using [`BEGIN CONVERSATION TIMER`](http://msdn.microsoft.com/en-us/library/ms187804.aspx) can help. If you have further questions, I recommend you post an explicit new question rather than piggybak comments on a an old one. – Remus Rusanu Jul 21 '12 at 21:22
  • @RemusRusanu 2 Questions - why are you using CTE and Could I do it without the Top(1)? (get all pending items) – Uri Abramson Feb 13 '14 at 14:52
  • 1
    "Properly implemented queues can handle thousands of concurrent users and service as high as 1/2 Million enqueue/dequeue operations per minute." - is there a source for this claim? – Dan Ling Jan 20 '17 at 20:40
  • @RemusRusanu Old post but I wonder that `Service broker queue` was not recommended here as an alternative because it is not suitable for concurrent processing? – ibubi Jul 14 '17 at 06:27
  • @RemusRusanu How about multiple condition on Where clausule? – Paul Aug 02 '17 at 11:45
  • Hey @RemusRusanu, please can you take a quick look at my question here: https://stackoverflow.com/questions/49341943/why-are-lock-hints-needed-on-an-atomic-statement. FYI: I think your blog post's being taken out of context / some of the advice you give above (e.g. "You must not query this table by any other means but by Dequeue") isn't included on your post, so people are incorrectly using these techniques in similar but at the same time very different scenarios. – JohnLBevan Mar 17 '18 at 21:52
  • The link isn't working... https://web.archive.org/web/20210315223332/http://rusanu.com/2010/03/26/using-tables-as-queues/ – br3nt Apr 08 '21 at 00:57
9

My answer here shows you how to use tables as queues... SQL Server Process Queue Race Condition

You basically need "ROWLOCK, READPAST, UPDLOCK" hints

Community
  • 1
  • 1
gbn
  • 422,506
  • 82
  • 585
  • 676
  • The locking hints are useful for protecting changes between a select and an update. They're not required or useful if you pop the queue with a single update statement – Andomar Sep 04 '10 at 13:56
  • @Andomar: this is what is needed: 100% safe concurrency for readers and writers... – gbn Sep 04 '10 at 15:46
1

If you want to serialize your operations for multiple clients, you can simply use application locks.

BEGIN TRANSACTION

EXEC  sp_getapplock @resource = 'app_token', @lockMode = 'Exclusive'

-- perform operation

EXEC  sp_releaseapplock @resource = 'app_token'

COMMIT TRANSACTION
JustBeingHelpful
  • 18,332
  • 38
  • 160
  • 245
1

I would stay away from messing with locks in a table. Just create two extra columns like IsProcessing (bit/boolean) and ProcessingStarted (datetime). When a worker crashes or doesn't update his row after a timeout you can have another worker try to process the data.

ZippyV
  • 12,540
  • 3
  • 37
  • 52
0

Rather than using a boolean value for Processed you could use an int to define the state of the command:

1 = not processed
2 = in progress
3 = complete

Each worker would then get the next row with Processed = 1, update Processed to 2 then begin work. When work in complete Processed is updated to 3. This approach would also allow for extension of other Processed outcomes, for example rather than just defining that a worker is complet you may add new statuses for 'Completed Succesfully' and 'Completed with Errors'

Macros
  • 7,099
  • 2
  • 39
  • 61
  • Thank you. Please see my edit. Am I mistaken about your suggestion? – Synesso Sep 04 '10 at 10:10
  • You are correct, you will need separate transactions to allow the other workers to see the update which should be the default behaviour - why would you keep a transaction open whilst the worker processes the command? I can see the parity in that a worker itself is essentially a transaction but this is almost certainly better coded yourself than using Sql Server transactions – Macros Sep 04 '10 at 10:18
  • @Macros any advice, handle "peek lock" with it? – Nuri YILMAZ May 16 '19 at 08:17
0

Probably the better option will be use a trisSate processed column along with a version/timestamp column. The three values in the processed column will then indicate indicates if the row is under processing, processed or unprocessed.

For example

    CREATE TABLE Queue ID INT NOT NULL PRIMARY KEY,
    Command NVARCHAR(100), 
    Processed INT NOT NULL CHECK (Processed in (0,1,2) ), 
    Version timestamp)

You grab the top 1 unprocessed row, set the status to underprocessing and set the status back to processed when things are done. Base your update status on the Version and the primary key columns. If the update fails then someone has already been there.

You might want to add a client identifier as well, so that if the client dies while processing it up, it can restart, look at the last row and then start from where it was.

no_one
  • 1,852
  • 12
  • 11
  • Thanks. Please see my edit. Also, for continuous availability, I would want any available client to resume a failed job - not just the one that failed. – Synesso Sep 04 '10 at 10:13
0

One way is to mark the row with a single update statement. If you read the status in the where clause and change it in the set clause, no other process can come in between, because the row will be locked. For example:

declare @pickup_id int
set @pickup_id = 1

set rowcount 1

update  YourTable
set     status = 'picked up'
,       @pickup_id = id
where   status = 'new'

set rowcount 0

return @pickup_id

This uses rowcount to update one row at most. If no row was found, @pickup_id will be -1.

Andomar
  • 232,371
  • 49
  • 380
  • 404