0

I have a SQL Server table with records (raw emails) that needs to be processed (build the email and send it) in a given order by an external process (mailer). Its not very resource intensive but can take a while with all the parsing and SMTP overhead etc.

To speed things up I can easily run multiple instance of the mailer process over multiple servers but worry that if two were to start at almost the same time they might still overlap a bit and send the same records.

Simplified for the question my table look something like this with each record having the data for the email.

queueItem
======================
  queueItemID PK
  ...data...
  processed bit
  priority int
  queuedStart datetime
  rowLockName varchar
  rowLockDate datetime

Batch 1 (Server 1)

  • starts at 12:00PM
  • lock/reserve the first 5000 rows (1-5000)
  • select the newly reserved rows
  • begin work

Batch 2 (Server 2)

  • starts at 12:15PM
  • lock/reserve the next 5000 rows (5001-10000)
  • select the newly reserved rows
  • begin work

To lock the rows I have been using the following:

declare @lockName varchar(36)
set @lockName = newid()
declare @batchsize int
set @batchsize = 5000

update queueItem
  set rowLockName = @lockName,
  rowLockDate = getdate()
where queueitemID in (
  select top(@batchsize) queueitemID 
  from queueItem
  where processed = 0
  and rowLockName is null
  and queuedStart <= getdate()
  order by priority, queueitemID 
)

If I'm not mistaken the query would start executing the SELECT subquery first and then lock the rows in preparation of the update, this is fast but not instantaneous.

My concern is that if I start two batches at near the same time (faster than the subquery runs) Batch 1's UPDATE might not be completed and Batch 2's SELECT would see the records as still available and attempt/succeed in overwriting Batch 1 (sort of race condition?)

I have ran some test but so far haven't had the issue with them overlapping, is it a valid concern that will come to haunt me at the worst of time?

Perhaps there are better way to write this query worth looking into as I am by no mean a T-SQL guru.

jfrobishow
  • 2,897
  • 2
  • 27
  • 42
  • 1
    Your process could begin by exclusively locking the table; and until you have the lock do nothing; once lock is acquired, update with name, release lock so next process can lock and update. (this means while this process is running updates/inserts to the data would be affected/queued; and if too long may time out) – xQbert Oct 21 '16 at 18:22
  • 1
    You may want to move this to the code review site – Neo Oct 21 '16 at 18:24
  • @xQbert looking at http://stackoverflow.com/a/23759307/70424 it looks like that might be what I am looking for indeed. I wasn't aware of that query hint. – jfrobishow Oct 21 '16 at 18:43
  • @jfrobishow cant we update each 5000 rows in loop using WHILE or GOTO with temporary table in a single statement and in same instance ? if you just want to speed up the update query. – Shushil Bohara Oct 22 '16 at 01:36
  • @Suraz Perhaps I might be missing something from your idea, but wouldn't it risk the same chance of two row being selected for update if multiple sessions run the same update? – jfrobishow Oct 24 '16 at 13:51

0 Answers0