T-SQL ways to avoid potentialy updating the same row based on subquery results

Question

I have a SQL Server table with records (raw emails) that needs to be processed (build the email and send it) in a given order by an external process (mailer). Its not very resource intensive but can take a while with all the parsing and SMTP overhead etc.

To speed things up I can easily run multiple instance of the mailer process over multiple servers but worry that if two were to start at almost the same time they might still overlap a bit and send the same records.

Simplified for the question my table look something like this with each record having the data for the email.

queueItem
======================
  queueItemID PK
  ...data...
  processed bit
  priority int
  queuedStart datetime
  rowLockName varchar
  rowLockDate datetime

Batch 1 (Server 1)

starts at 12:00PM
lock/reserve the first 5000 rows (1-5000)
select the newly reserved rows
begin work

Batch 2 (Server 2)

starts at 12:15PM
lock/reserve the next 5000 rows (5001-10000)
select the newly reserved rows
begin work

To lock the rows I have been using the following:

declare @lockName varchar(36)
set @lockName = newid()
declare @batchsize int
set @batchsize = 5000

update queueItem
  set rowLockName = @lockName,
  rowLockDate = getdate()
where queueitemID in (
  select top(@batchsize) queueitemID 
  from queueItem
  where processed = 0
  and rowLockName is null
  and queuedStart <= getdate()
  order by priority, queueitemID 
)

If I'm not mistaken the query would start executing the SELECT subquery first and then lock the rows in preparation of the update, this is fast but not instantaneous.

My concern is that if I start two batches at near the same time (faster than the subquery runs) Batch 1's UPDATE might not be completed and Batch 2's SELECT would see the records as still available and attempt/succeed in overwriting Batch 1 (sort of race condition?)

I have ran some test but so far haven't had the issue with them overlapping, is it a valid concern that will come to haunt me at the worst of time?

Perhaps there are better way to write this query worth looking into as I am by no mean a T-SQL guru.

Your process could begin by exclusively locking the table; and until you have the lock do nothing; once lock is acquired, update with name, release lock so next process can lock and update. (this means while this process is running updates/inserts to the data would be affected/queued; and if too long may time out) — xQbert, Oct 21 '16 at 18:22
@xQbert looking at http://stackoverflow.com/a/23759307/70424 it looks like that might be what I am looking for indeed. I wasn't aware of that query hint. — jfrobishow, Oct 21 '16 at 18:43
@jfrobishow cant we update each 5000 rows in loop using WHILE or GOTO with temporary table in a single statement and in same instance ? if you just want to speed up the update query. — Shushil Bohara, Oct 22 '16 at 01:36
@Suraz Perhaps I might be missing something from your idea, but wouldn't it risk the same chance of two row being selected for update if multiple sessions run the same update? — jfrobishow, Oct 24 '16 at 13:51

T-SQL ways to avoid potentialy updating the same row based on subquery results

0 Answers0