6

I need to develop a server application (in C#) that will read rows from a simple table (in SQL Server 2005 or 2008), do some work, such as calling a web service, and then update the rows with the resulting status (success, error).

Looks quite simple, but things get tougher when I add the following application requisites:

  • Multiple application instances must be running at the same time, for Load Balancing and Fault Tolerance purposes. Typically, the application will be deployed on two or more servers, and will concurrently access the same database table. Each table row must be processed only once, so a common synchronization/locking mechanism must be used between multiple application instances.

  • When an application instance is processing a set of rows, other application instances shouldn't have to wait for it to end in order to read a different set of rows waiting to be processed.

  • If an application instance crashes, no manual intervention should need to take place on the table rows that were being processed (such as removing temporary status used for application locking on rows that the crashing instance was processing).

  • The rows should be processed in a queue-like fashion, i.e., the oldest rows should be processed first.

Although these requisites don't look too complex, I'm having some trouble in coming up with a solution.

I've seen locking hint suggestions, such as XLOCK, UPDLOCK, ROWLOCK, READPAST, etc., but I see no combination of locking hints that will allow me to implement these requisites.

Thanks for any help.

Regards,

Nuno Guerreiro

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459

3 Answers3

5

This is a typical table as queue pattern, as described in Using tables as Queues. You would use a Pending Queue and the dequeue transaction should also schedule a retry in a reasonable timeout. Is not realistically possible to hold on to locks for the duration of the web calls. On success, you would remove the pending item.

You also need to be able to dequeue in batch, dequeuing one-by-one is too slow if you go into serious load (100 and thousands of operations per second). So taking the Pending Queue example from the article linked:

create table PendingQueue (
  id int not null,
  DueTime datetime not null,
  Payload varbinary(max),
  cnstraint pk_pending_id nonclustered primary key(id));

create clustered index cdxPendingQueue on PendingQueue (DueTime);
go

create procedure usp_enqueuePending
  @dueTime datetime,
  @payload varbinary(max)
as
  set nocount on;
  insert into PendingQueue (DueTime, Payload)
    values (@dueTime, @payload);
go

create procedure usp_dequeuePending
  @batchsize int = 100,
  @retryseconds int = 600
as
  set nocount on;
  declare @now datetime;
  set @now = getutcdate();
  with cte as (
    select top(@batchsize) 
      id,
      DueTime,
      Payload
    from PendingQueue with (rowlock, readpast)
    where DueTime < @now
    order by DueTime)
  update cte
    set DueTime = dateadd(seconds, @retryseconds, DueTime)
    output deleted.Payload, deleted.id;
go

On successful processing you would remove the item from the queue using the ID. On failure, or on crash, it would be retries automatically in 10 minutes. One think you must internalize is that as long as HTTP does not offer transactional semantics you will never be able to do this with 100% consistent semantics (eg. guarantee that no item is processed twice). You can achieve a very high margin for error, but there will always be a moment when the system can crash after the HTTP call succeeded before the database is updated, and will cause the same item to be retried since you cannot distinguish this case from a case when the system crashed before the HTTP call.

Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
  • Looks like the simple and elegant solution I was looking for. I'll read the article in your link and make some tests. Yes, you're right about never reaching 100% consistency in this scenario. I forgot to mention that. Thanks for your reply. – Nuno Guerreiro Jul 05 '12 at 10:46
  • I was planning on having each application instance poll the database table to detect when new items are added, but I'm afraid of running into performance issues. Do you think I could use `SQLDependency` to notify my application instances when new items are added to the queue? Would it fit this multiple application instances scenario? And what would happen if for example 1000 rows are added at once to the queue table? Would the application get notified 1000 times? Could this also turn into a performance bottleneck? – Nuno Guerreiro Jul 05 '12 at 16:37
  • SQLDependency is a no go. If you are willing to use Service Broker ([which powers the delivery of the Query Notifications used by SQLDependency](http://rusanu.com/2006/06/17/the-mysterious-notification/)) then just use plain Service Broker. But my recommendation would be to pool. Few years ago I did exactly what you're doing (use a table to queue HTTP calls, payments) and intelligent back off coupled with batch dequeue worked out quite well. – Remus Rusanu Jul 05 '12 at 17:13
4

I initially suggested SQL Server Service Broker for this. However, after some research it turns out this is probably not the best way of handling the problem.

What you're left with is the table architecture you've asked for. However, as you've been finding, it is unlikely that you will be able to come up with a solution that meets all the given criteria, due to the great complexity of locking, transactions, and the pressures placed on such a scheme by high concurrency and high transactions per second.

Note: I am currently researching this issue and will get back to you with more later. The following script was my attempt to meet the given requirements. However, it suffers from frequent deadlocks and processes items out of order. Please stay tuned, and in the meantime consider a destructive reads method (DELETE with OUTPUT or OUTPUT INTO).

SET XACT_ABORT ON; -- blow up the whole tran on any errors
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN TRAN

UPDATE X
SET X.StatusID = 2 -- in process
OUTPUT Inserted.*
FROM (
   SELECT TOP 1 * FROM dbo.QueueTable WITH (READPAST, ROWLOCK)
   WHERE StatusID = 1 -- ready
   ORDER BY QueuedDate, QueueID -- in case of items with the same date
) X;

-- Do work in application, holding open the tran.

DELETE dbo.QueueTable WHERE QueueID = @QueueID; -- value taken from recordset that was output earlier

COMMIT TRAN;

In the case of several/many rows being locked at once by a single client, there is a possibility of the rowlock escalating to an extent, page, or table lock, so be aware of that. Also, normally holding long-running transactions that maintain locks is a big no-no. It may work in this special usage case, but I fear that high tps by multiple clients will make the system break down. Note that normally, the only processes querying your queue table should be those that are doing queue work. Any processes doing reporting should use READ UNCOMMITTED or WITH NOLOCK to avoid interfering with the queue in any way.

What is the implication of rows being processed out of order? If an application instance crashes while another instance is successfully completing rows, this delay will likely cause at least one row to be delayed in its completion, causing the processing order to be incorrect.

If the transaction/locking method above is not to your satisfaction, another way to handle your application crashing would be to give your instances names, then set up a monitor process that has the capacity to check periodically if those named instances are running. When a named instance starts up it would always reset any unprocessed rows that possess its instance identifier (something as simple as "instance A" and "instance B" would work). Additionally, the monitor process would check if the instances are running and if one of them is not, reset the rows for that missing instance, enabling any other instances to run. There would be a small lag between crash and recovery, but with proper architecture it could be quite reasonable.

Note: The following links should be edifying:

Community
  • 1
  • 1
ErikE
  • 48,881
  • 23
  • 151
  • 196
  • If the messages don't need to be persisted between their source and their destination but only after they are processed then indeed Service Broker would be more appropriate than MSMQ. I'm actually in the process of implementing something with Service Broker's external activation and it's pretty good. I'm going to +1 this just because it's contained almost entirely in SQL Server. – Marcel N. Jul 04 '12 at 18:50
  • 1
    I have invested significant amounts of my own time into test scripts for this answer. To be fair, I did get stuck and so [asked my own question on dba.stackexchange](http://dba.stackexchange.com/questions/20399/fifo-queue-table-for-multiple-workers-in-sql-server). So you may be interested in reading that as well. – ErikE Jul 04 '12 at 22:21
  • Just one note: there's no big deal if rows can't be processed in the order they were created. Thanks for your replies. I'll have a look at your question on the referred forum. – Nuno Guerreiro Jul 05 '12 at 11:00
2

You can't do this with SQL transactions (or relying on transactions as your main component here). Actually, you can do this, but you shouldn't. Transactions are not meant to be used this way, for long locks, and you shouldn't abuse them like this.

Keeping a transaction open for that long (retrieve rows, call the web service, get back to make some updates) is simply not good. And there's no optimistic locking isolation level that will allow you to do what you want.

Using ROWLOCK is also not a good idea, because it's just that. A hint. It's subject to lock escalation, and it can be converted to a table lock.

May I suggest a single entry point to your database? I think it fits in the pub/sub design. So there would be only one component that reads/updates these records:

  1. Reads batches of messages (enough for all your other instances to consume) - 1000, 10000, whatever you see fit. It makes these batches available to the other (concurrent) components through some queued way. I'm not going to say MSMQ :) (it would be the second time today I recommend it, but it's really suitable in your case too).
  2. It marks the messages as in progress or something similar.
  3. Your consumers are all bound, transactionally, to the inbound queue and do their stuff.
  4. When ready, after the web service call, they put the messages in an outbound queue.
  5. The central component picks them up and, inside a distributed transaction, does an update on the database (if it fails the messages will stay in the queue). Since it is the only one that could do that operation you won't have any concurrency issues. At least not on the database.
  6. In the mean time it can read the next pending batch and so on.
Marcel N.
  • 13,726
  • 5
  • 47
  • 72
  • Locking hints are not hints. Their behavior is documented and guaranteed. – usr Jul 04 '12 at 18:06
  • 1
    @usr: Not always. There are some restrictions, which I linked to in my answer. Have a look here: http://msdn.microsoft.com/en-us/library/ms184286(v=sql.105).aspx. – Marcel N. Jul 04 '12 at 18:09
  • @usr: Updated my answer to be more specific in that regard. Only rowlock mentioned. – Marcel N. Jul 04 '12 at 18:13
  • 1
    Actually: row locks are converted to **table locks** during lock escalation (not to page locks) – marc_s Jul 04 '12 at 19:19
  • @the coon: SQL Server honours hints. MySQL doesn't, which is why MySQL has FORCE and SQL Server doesn0t – gbn Jul 05 '12 at 08:01
  • @gbn: True, but you still have lock escalation. So there are specific situations when a lock hint (i.e. rowlock or paglock) are not respected. The link that documents this is in my answer. – Marcel N. Jul 05 '12 at 09:22
  • I forgot to mention that the web service my application will be calling is on the same server and it will receive several table rows at once (about 30 or 50) and will give back the response in about 50 ms or less. So, I'm not expecting to have long running transactions – Nuno Guerreiro Jul 05 '12 at 11:02
  • @user1502124: I understand. But even so, I would make my design scalable, not bound to the current infrastructure. – Marcel N. Jul 05 '12 at 11:36