ATOMicity of a Single Statement
I think your code's fine as it is. i.e. Because you have a single statement which updates the status to printing
as soon as the statement runs the status is updated; so anything running before which searches for print
would have updated the same record to printing
before your process saw it; so your process would pick a subsequent record, or any process hitting it after your statement had run would see it as printing
so would not pick it up`. There isn't really a scenario where a record could pick it up whilst the statement's running since as discussed a single SQL statement should be atomic.
Disclaimer
That said, I'm not enough of an expert to say whether explicit lock hints would help; in my opinion they're not needed as the above is atomic, but others in the comments are likely better informed than me. However, running a test (albeit where database and both threads are run on the same machine) I couldn't create a race condition... maybe if the clients were on different machines / if there was a lot more concurrency you'd be more likely to see issues.
My hope is that others have interpreted your question differently, hence the disagreement.
Attempt to Disprove Myself
Here's the code I used to try to cause a race condition; you can drop it into LINQPad 5
, select language C# Program
, tweak the connectionstring (and optionally any statements) as required, then run:
const long NoOfRecordsToTest = 1000000;
const string ConnectionString = "Server=.;Database=Play;Trusted_Connection=True;"; //assumes a database called "play"
const string DropFifoQueueTable = @"
if object_id('FIFOQueue') is not null
drop table FIFOQueue";
const string CreateFifoQueueTable = @"
create table FIFOQueue
(
Id bigint not null identity (1,1) primary key clustered
, Processed bit default (0) --0=queued, null=processing, 1=processed
)";
const string GenerateDummyData = @"
with cte as
(
select 1 x
union all
select x + 1
from cte
where x < @NoRowsToGenerate
)
insert FIFOQueue(processed)
select 0
from cte
option (maxrecursion 0)
";
const string GetNextFromQueue = @"
with singleRecord as
(
select top (1) Id, Processed
from FIFOQueue --with(updlock, rowlock, readpast) --optionally include this per comment discussions
where processed = 0
order by Id
)
update singleRecord
set processed = null
output inserted.Id";
//we don't really need this last bit for our demo; I've included in case the discussion turns to this..
const string MarkRecordProcessed = @"
update FIFOQueue
set Processed = 1
where Id = @Id";
void Main()
{
SetupTestDatabase();
var task1 = Task<IList<long>>.Factory.StartNew(() => ExampleTaskForced(1));
var task2 = Task<IList<long>>.Factory.StartNew(() => ExampleTaskForced(2));
Task.WaitAll(task1, task2);
foreach (var processedByBothThreads in task1.Result.Intersect(task2.Result))
{
Console.WriteLine("Both threads processed id: {0}", processedByBothThreads);
}
Console.WriteLine("done");
}
static void SetupTestDatabase()
{
RunSql<int>(new SqlCommand(DropFifoQueueTable), cmd => cmd.ExecuteNonQuery());
RunSql<int>(new SqlCommand(CreateFifoQueueTable), cmd => cmd.ExecuteNonQuery());
var generateData = new SqlCommand(GenerateDummyData);
var param = generateData.Parameters.Add("@NoRowsToGenerate",SqlDbType.BigInt);
param.Value = NoOfRecordsToTest;
RunSql<int>(generateData, cmd => cmd.ExecuteNonQuery());
}
static IList<long> ExampleTaskForced(int threadId) => new List<long>(ExampleTask(threadId)); //needed to ensure prevent lazy loadling from causing issues with our tests
static IEnumerable<long> ExampleTask(int threadId)
{
long? x;
while ((x = ProcessNextInQueue(threadId)).HasValue)
{
yield return x.Value;
}
//yield return 55; //optionally return a fake result just to prove that were there a duplicate we'd catch it
}
static long? ProcessNextInQueue(int threadId)
{
var id = RunSql<long?>(new SqlCommand(GetNextFromQueue), cmd => (long?)cmd.ExecuteScalar());
//Debug.WriteLine("Thread {0} is processing id {1}", threadId, id?.ToString() ?? "[null]"); //if you want to see how we're doing uncomment this line (commented out to improve performance / increase the likelihood of a collision
/* then if we wanted to do the second bit we could include this
if(id.HasValue) {
var markProcessed = new SqlCommand(MarkRecordProcessed);
var param = markProcessed.Parameters.Add("@Id",SqlDbType.BigInt);
param.Value = id.Value;
RunSql<int>(markProcessed, cmd => cmd.ExecuteNonQuery());
}
*/
return id;
}
static T RunSql<T>(SqlCommand command, Func<SqlCommand,T> callback)
{
try
{
using (var connection = new SqlConnection(ConnectionString))
{
command.Connection = connection;
command.Connection.Open();
return (T)callback(command);
}
}
catch (Exception e)
{
Debug.WriteLine(e.ToString());
throw;
}
}
Other comments
The above discussion's really only talking about multiple threads taking the next record from the queue whilst avoiding any single record being picked up by multiple threads. There are a few other points...
Race Condition outside of SQL
Per our discussion, if FIFO is mandatory there are other things to worry about. i.e. whilst your threads will pick up each record in order after that it's up to them. e.g. Thread 1
gets record 10
then Thread 2
gets record 11
. Now Thread 2
sends record 11
to the printer before Thread 1
sends record 10
. If they're going to the same printer, your prints would be out of order. If they're different printers, not a problem; all prints on any printer are sequential. I'm going to assume the latter.
Exception Handling
If any exception occurs in a thread that's processing something (i.e. so the thread's record is printing
) then consideration should be made on how to handle this. One option is to keep that thread retrying; though that may be indefinite if it's some fundamental fault. Another is to put the record to some error
status to be handled by another process / where it's accepted that this record won't appear in order. Finally, if the order of invoices in the queue is an ideal rather than a hard requirement you could have the owning thread put the status back to print
so that it or another thread can pick up that record to retry (though again, if there's something fundamentally wrong with the record, this may block up the queue).
My recommendation here is the error
status; as then you have more visibility on the issue / can dedicate another process to dealing with issues.
Crash Handling
Another issue is that because your update to printing
is not held within a transaction, if the server crashes you leave the record with this status in the database, and when your system comes back online it's ignored. Ways to avoid that are to include a column saying which thread is processing it; so that when the system comes back up that thread can resume where it left off, or to include a date stamp so that after some period of time any records with status printing
which "timeout" can be swept up / reset to Error
or Print
statuses as required.
WITH CTE AS
(
SELECT TOP(1) [InvoiceID], [Status], [ThreadId]
FROM INVOICES
WHERE [Status] = 'Print'
OR ([Status] = 'Printing' and [ThreadId] = @ThreadId) --handle previous crash
ORDER BY [PrintRequestedDate], [InvoiceID]
)
UPDATE CTE
SET [Status] = 'Printing'
, [ThreadId] = @ThreadId
OUTPUT Inserted.[InvoiceID]
Awareness of Other Processes
We've mostly been focussed on the printing element; but other processes may also be interacting with your Invoices
table. We can probably assume that aside from creating the initial Draft
record and updating it to Print
once ready for printing, those processes won't touch the Status
field. However, the same records could be locked by completely unrelated processes. If we want to ensure FIFO we must not use the ReadPast
hint, as some records may have status Print
but be locked, so we'd skip over them despite them having the earlier PrintRequestedDate
. However, if we want things to print as quickly as possible, with them being in order when otherwise not inconvenient, including ReadPast
will allow our print process to skip the locked records and carry on, coming back to handle them once they're released.
Similarly another process may lock our record when it's at Printing
status, so we can't update it to mark it as complete. Again, if we want to avoid this from causing a holdup we can make use of the ThreadId
column to allow our thread to leave the record at status Printing
and come back to clean it up later when it's not locked. Obviously this assumes that the ThreadId
column is only used by our print process.
Have a dedicated print queue table
To avoid some of the issues with unrelated processes locking your invoices, move the Status
field into its own table; so you only need to read from the invoices
table; not update it.
This will also have the advantage that (if you don't care about print history) you can delete the record once done, so you'll get better performance (as you won't have to search through the entire invoices table to find those which are ready to print). That said, there's another option for this one (if you're on SQL2008 or above).
Use Filtered Index
Since the Status column will be updated several times it's not a great candidate for an index; i.e. as the record's position in the index jumps from branch to branch as the status progresses.
However, since we're filtering on it it's also something that would really benefit from having an index.
To get around this contradiction, one option's to use a filtered index; i.e. only index those records we're interested in for our print process; so we maintain a small index for a big benefit.
create nonclustered index ixf_Invoices_PrintStatusAndDate
on dbo.Invoices ([Status], [PrintRequestedDate])
include ([InvoiceId]) --just so we don't have to go off to the main table for this, but can get all we need form the index
where [Status] in ('Print','Printing')
Use "Enums" / Reference Tables
I suspect your example uses strings to keep the demo code simple, however including this for completeness.
Using strings in databases can make things hard to support. Rather than having status be a string value, instead use an ID from a related Statuses table.
create table Statuses
(
ID smallint not null primary key clustered --not identity since we'll likely mirror this enum in code / may want to have defined ids
,Name
)
go
insert Statuses
values
(0, 'Draft')
, (1,'Print')
, (2,'Printing')
, (3,'Printed')
create table Invoices
(
--...
, StatusId smallint foreign key references Statuses(Id)
--...
)