1

I need to run a SQL Server query that either "gets" or "creates" a record. Pretty simple, but I am not sure if this will create any race conditions.

I have read a few articles regarding locking during insert/updates e.g. http://weblogs.sqlteam.com/dang/archive/2007/10/28/Conditional-INSERTUPDATE-Race-Condition.aspx, but I'm not sure if this is relevant to me given I'm using SQL Server 2012, and I don't have contention over modification per se, only over ensuring 'one off creation' of my record.

Also, I will have a primary key constraint over the Id column (not an identity column), so I know the race condition at worst could only ever create an error, and not invalid data, but I also don't want the command to throw an error.

Can someone please shed some light on how I need to solve this? Or am I over thinking this and can I simply do something like:

IF EXISTS(SELECT * FROM Table WHERE Id = @Id)
BEGIN
      SELECT Id, X, Y, Z FROM Table WHERE Id = @Id
END
ELSE
BEGIN
      INSERT INTO Table (Id, X, Y, Z)
      VALUES (@Id, @X, @Y, @Z);

      SELECT @Id, @x, @Y, @Z;
END

I've been in document database land for a few years and my SQL is very rusty.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
AaronHS
  • 1,334
  • 12
  • 29
  • Simply switch to repeatable read isolation level, open a transaction and that's it. – dean May 03 '14 at 17:18

3 Answers3

3

Your code definitely has race conditions.

One approach is to merge the conditions into one insert, but I'm not 100% sure that this protects against all race conditions, but this might work:

INSERT INTO Table(Id, X, Y, Z)
    SELECT @Id, @X, @Y, @Z
    WHERE NOT EXISTS (SELECT 1 FROM Table WHERE ID = @ID);

SELECT Id, X, Y, Z FROM Table WHERE ID = @Id;

The issue is about the underlying locking mechanism during the duplicate detection versus the insert. You can try to use more explicit locking, but a table lock will surely slow down processing (to learn about locks, you can turn to the documentation or a blog post such as this).

The simplest way that I can think to make this work consistently is to use try/catch blocks:

BEGIN TRY
    INSERT INTO Table(Id, X, Y, Z)
        SELECT @Id, @X, @Y, @Z;
END TRY
BEGIN CATCH
END CATCH;

SELECT Id, X, Y, Z FROM Table WHERE ID = @Id;

That is, try the insert. If it fails (presumably because of a duplicate key, but you can explicitly check for that), then ignore the failure. Return the values for the id afterwards.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • i like that the second solution is pragmatic, by it does feel dirty having to resort to exception handling, when in fact the error thrown from the insert wouldn't be 'exceptional' at all. Rather it would be quite a frequent occurrence, particularly as the db gets larger over time. i don't how expensive exception handling is in sql, but in most languages its quite expensive - usually having to unwind the stack and all – AaronHS May 03 '14 at 15:58
  • i guess maybe i could put the insert...where not exists... into the begin try .. end try to minimise the occurrence of the error case – AaronHS May 03 '14 at 16:00
  • @AaronHS [Here's a performance measurement.](http://sqlperformance.com/2012/08/t-sql-queries/error-handling) Yes, you can combine it, though personally I'd opt for the `IF NOT EXISTS(...) BEGIN TRY (insert without check)... END TRY BEGIN CATCH ... END CATCH` syntax. It's highly subjective, but I consider it more readable than the `INSERT ... SELECT ... WHERE NOT EXISTS`. –  May 03 '14 at 16:03
  • @hvd . . . If you do the exception handling, it doesn't make a difference. But, in practice, the big gap for the race condition is between the two queries -- the overhead for preparing the second query and putting it into the engine. That is why I prefer to wrap it in a single query. But, as I say, with proper exception handling, it become an aesthetic consideration. – Gordon Linoff May 03 '14 at 16:11
  • @GordonLinoff No, that's why I included the link. Exception handling is relatively slow. The common case will be where the record already exists when the check is run, or where the record still doesn't exist when the insert starts. Both of those cases will require no exception handling. Only the uncommon case where the record is inserted between the check and the insert requires exception handling. –  May 03 '14 at 16:14
  • @GordonLinoff I may have misunderstood you. If you mean there's no difference between `IF NOT EXISTS(...) BEGIN TRY INSERT ... END TRY BEGIN CATCH ... END CATCH` and `BEGIN TRY INSERT ... WHERE NOT EXISTS(...) END TRY BEGIN CATCH ... END CATCH` other than readability, then yes, I agree, and that's what I tried to say too. –  May 03 '14 at 16:17
  • @hvd . . . That is what I meant to say. Also, that blog post seems to support that "justtrycatch" is a very reasonable solution in terms of performance. Unfortunately, I can't see determine from the blog what the code is for that option, but it sounds like this approach from the name. By the way, thank you for the pointer and the comments. – Gordon Linoff May 03 '14 at 16:29
  • @GordonLinoff Yes, it is indeed `BEGIN TRY (insert without check) ... END TRY BEGIN CATCH END CATCH`. Actually, I may have misread. Performance is significantly worse than checking beforehand if there are many failures, but performance is slightly better than checking beforehand if failures are rare. And in this case, failures should be rare, so your answer is spot on. –  May 03 '14 at 16:58
  • ah... failures wont be rare when doing begin try .. insert without check .. end try. quite the opposite. as the table size increases, the likelihood of the inserts failing are going to grow – AaronHS May 03 '14 at 17:08
  • i have posted a comment on Joel's answer. Do you guys care to comment on your thoughts about his solution? – AaronHS May 03 '14 at 17:09
  • @AaronHS Ah, yes, if failures wouldn't be rare, then do check beforehand. But Joel's answer would have to lock the entire table, would it not? That's something not measured in the link I gave, but I would be very surprised if it gives better performance. –  May 03 '14 at 17:21
3

GET occurs far more frequently than CREATE, so it makes sense to optimize for that.

IF EXISTS (SELECT * FROM MyTable WHERE Id=@Id)
    SELECT Id, X, Y, Z FROM MyTable WHERE Id=@Id
ELSE
BEGIN
    BEGIN TRAN
        IF NOT EXISTS (SELECT * FROM MyTable WITH (HOLDLOCK, UPDLOCK) WHERE Id=@Id )
        BEGIN
        --  WAITFOR DELAY '00:00:10'
            INSERT INTO MyTable (Id, X, Y, Z) VALUES (@Id, @X, @Y, @Z)
        END
        SELECT Id, X, Y, Z FROM MyTable WHERE Id=@Id
    COMMIT TRAN
END

If your record does not exist, begin an explicit transaction. Both HOLDLOCK and UPDLOCK are required (thanks @MikaelEriksson). The table hints hold the shared lock and update lock for the duration of the transaction. This properly serializes the INSERT and reliably avoids the race condition.

To verify, uncomment WAITFOR and run two or more queries at the same time.

Joel Allison
  • 2,091
  • 1
  • 12
  • 9
  • this seems to be the best options given: it is optimised for gets, is completely thread safe, doesn't abuse exception handling and is as readable as all the other solutions. does anyone else care to comment? – AaronHS May 03 '14 at 17:08
  • @AaronHS . . . I'm not sure what you mean by "abuse exception handling". The `try`/`catch` functionality is there for situations such as this. If you have pre-conceived notions from other programming languages, they may not apply to SQL Server. And, this runs into a worse problem, which is deadlocks. Two queries could find themselves in the `else` clause, blocking each other. – Gordon Linoff May 03 '14 at 17:26
  • 1
    If you also add an `UPDLOCK` to the first query you will also prevent the potential deadlock situation. – Mikael Eriksson May 03 '14 at 17:40
  • A deadlock can only occur if there are at least two resources to be locked and two actors that each acquire and hold one of the resources. Serialization is not the same as deadlock. – Joel Allison May 03 '14 at 17:42
  • Two threads can get the shared lock in the first query at the same time and wait for each other to upgrade the lock for the insert. There you have your deadlock. – Mikael Eriksson May 03 '14 at 17:50
  • Add the updlock hint and you have a range lock for the new value in place right from the start that is preventing other threads to get the lock for the same "value". – Mikael Eriksson May 03 '14 at 17:52
  • 1
    @MikaelEriksson, agreed. Edited to reflect your correction. – Joel Allison May 03 '14 at 18:03
  • 1
    You actually need both serializable/holdlock and updlock. As it is now it can give you dupe key errors if executed under the default isolation level read committed. – Mikael Eriksson May 03 '14 at 21:01
  • @GordonLinoff - I agree that error handling is viewed differently in different languages. but ideally, as a general principal (which applies to all languages) you shouldn't be writing code that you *know* is going to generate an error condition during *normal* usage. instead, you should be writing your code in such a way that the *expected* error simply doesn't occur. if the error is an edge case which you're not expecting, sure... but that's not the case here. Nevertheless, thank you for your comments they have been very helpful – AaronHS May 04 '14 at 11:18
0

Why do you need two different SELECTs?

IF NOT EXIST (SELECT TOP 1 1 FROM Table WHERE Id = @Id)
    INSERT INTO Table(Id, X, Y, Z)
    VALUES (@Id, @X, @Y, @Z)

SELECT Id, X, Y, Z FROM Table WHERE ID = @Id

As long as you don't specify WITH (NOLOCK) when you're checking existence, you should be fine.

nhgrif
  • 61,578
  • 25
  • 134
  • 173
  • well, i don't know that i do need them. i'm just not 100% what the best way of writing this query is, whilst avoiding any potential concurrency issues that may or may not exist – AaronHS May 03 '14 at 15:42
  • 1
    Are you saying this will not allow an insert from another process between the `SELECT TOP 1 1 ...` and the `INSERT INTO ...` -- that the select causes a lock? If so, some reference to back that up would improve this answer. (I'm not saying you're wrong, I don't know.) –  May 03 '14 at 15:43
  • If a second thread enters IF NOT EXISTS before first one finishes insert, then both will insert and you will have duplicates – Ivan Anatolievich Jun 01 '18 at 10:33