Only inserting a row if it's not already there

Question

I had always used something similar to the following to achieve it:

INSERT INTO TheTable
SELECT
    @primaryKey,
    @value1,
    @value2
WHERE
    NOT EXISTS
    (SELECT
        NULL
    FROM
        TheTable
    WHERE
        PrimaryKey = @primaryKey)

...but once under load, a primary key violation occurred. This is the only statement which inserts into this table at all. So does this mean that the above statement is not atomic?

The problem is that this is almost impossible to recreate at will.

Perhaps I could change it to the something like the following:

INSERT INTO TheTable
WITH
    (HOLDLOCK,
    UPDLOCK,
    ROWLOCK)
SELECT
    @primaryKey,
    @value1,
    @value2
WHERE
    NOT EXISTS
    (SELECT
        NULL
    FROM
        TheTable
    WITH
        (HOLDLOCK,
        UPDLOCK,
        ROWLOCK)
    WHERE
        PrimaryKey = @primaryKey)

Although, maybe I'm using the wrong locks or using too much locking or something.

I have seen other questions on stackoverflow.com where answers are suggesting a "IF (SELECT COUNT(*) ... INSERT" etc., but I was always under the (perhaps incorrect) assumption that a single SQL statement would be atomic.

Does anyone have any ideas?

Have you tried using a merge without a `WHEN MATCHED` clause? — a'r, Aug 04 '10 at 16:59
It varies depending on the client. Anything between and including 2000 and 2008 R2. Although we may have been on 7 when the statement was originally written! — Adam, Aug 04 '10 at 17:20
I must have a look at this new (to me) `MERGE` statement. Does it perform any better in this case? — Adam, Aug 04 '10 at 17:23
I do not see the point ! Just insert your data, and if the PK already exists, the insert will fail, and that will be fine. Or am I missing something ? — iDevlop, Aug 04 '10 at 20:45
You don't know the ROWLOCK hint. But the UPDLOCK, HOLDLOCK hints are required. The only other way to do this is to recover from the insert collision and do an update in that case (which can be a viable strategy for high transaction-per-second systems). — ErikE, Oct 02 '10 at 01:07

score 73 · Accepted Answer · edited Dec 13 '17 at 20:20

73

What about the "JFDI" pattern?

BEGIN TRY
   INSERT etc
END TRY
BEGIN CATCH
    IF ERROR_NUMBER() <> 2627
      RAISERROR etc
END CATCH

Seriously, this is quickest and the most concurrent without locks, especially at high volumes. What if the UPDLOCK is escalated and the whole table is locked?

Read lesson 4:

Lesson 4: When developing the upsert proc prior to tuning the indexes, I first trusted that the If Exists(Select…) line would fire for any item and would prohibit duplicates. Nada. In a short time there were thousands of duplicates because the same item would hit the upsert at the same millisecond and both transactions would see a not exists and perform the insert. After much testing the solution was to use the unique index, catch the error, and retry allowing the transaction to see the row and perform an update instead an insert.

edited Dec 13 '17 at 20:20

David Schwartz

1,956
19
28

answered Aug 04 '10 at 17:42

gbn

422,506
82
585
676

Thanks - okay, I agree that this is probably what I will end up using, and is the answer to the actual question. – Adam Aug 04 '10 at 19:58
1

I know it's bad to rely on errors like this, but I wonder if doing this with just a straight `INSERT` (without the `EXISTS`) would perform better (i.e. try insert no matter what and just ignore error 2627). – Adam Aug 04 '10 at 20:02
1

That depends on whether you mostly insert values that don't exist or mostly values that *do* exist. In the latter case, I'd argue the performance will be poorer due to tons of exceptions being raised and ignored. – GSerg Aug 04 '10 at 21:11
@Gserg: correct. But then OP would have posted an INSERT/UPDATE question, not test for inert arguably. We use this to filter out a few thousand duplicates in dozen million new rows per day – gbn Aug 05 '10 at 04:19
I don't understand this answer – gyozo kudor Feb 10 '16 at 08:46
@gbn - Sorry for being too noob. What is the meaning of 35k tps? Also, how does it makes sure that only distinct records are added in the table? How does `try` - `catch` handle do this? – Zameer Ansari Feb 10 '16 at 14:22
5

@student 35k tps = "35000 transactions per second". The TRY CATCH prevents a duplicate entry from being inserted by catching the *unique constraint* violation error (error number 2627) and ignoring it. The CATCH will only rethrow the error if it is *not* 2627. There is an issue with this snippet because a *unique index* violation is error 2601. So you have to check for both of those codes. This solution also only works for single row INSERTs. If you try to INSERT from one table into another table, you need a different strategy. – Jim Nov 25 '16 at 11:53

score 27 · Answer 2 · edited May 23 '17 at 12:09

27

I added HOLDLOCK which wasn't present originally. Please disregard the version without this hint.

As far as I'm concerned, this should be enough:

INSERT INTO TheTable 
SELECT 
    @primaryKey, 
    @value1, 
    @value2 
WHERE 
    NOT EXISTS 
    (SELECT 0
     FROM TheTable WITH (UPDLOCK, HOLDLOCK)
     WHERE PrimaryKey = @primaryKey)

Also, if you actually want to update a row if it exists and insert if it doesn't, you might find this question useful.

edited May 23 '17 at 12:09

Community

1
1

answered Aug 04 '10 at 17:01

GSerg

76,472
17
159
346

2

What are you locking when the row doesn't exist? – Martin Smith Aug 04 '10 at 17:04
3

A relevant range in the index (the primary key in this case). – GSerg Aug 04 '10 at 17:05
@GSerg Agreed. The pessimistic/optimistic locking of the select statement needs a directive. – DaveWilliamson Aug 04 '10 at 17:06
Thanks. This makes sense to me too. Although when I originally wrote the statement I naively assumed that something magical would happen inside the server! – Adam Aug 04 '10 at 17:18
Two IS's cause no conflict. Can you replace the IDs with object names? – GSerg Aug 04 '10 at 20:06
The point is, two `U`'s are not compatible. In your screenshot there are no `U`'s, which is probably because you table doesn't have an index on the `id` column. The updlock places a `KEY` U-lock on that index. – GSerg Aug 04 '10 at 20:58
I'm testing both cases, and there's always a `U` lock. Can you do the same little test on your side (see my edited answer)? – GSerg Aug 04 '10 at 21:08
The issue with your test (I think) is it is not the same as the OP's original situation. The OP only has one statement and the pattern of lock release seems to be the same regardless of hint for that. Without the hint the locks are released at the end of the statement and before the delay with the hint they are released at the end of the whole transaction. I definitely only see `U` locks when I am running it and the `where id=4` bit matches a record. What are they on? a range? – Martin Smith Aug 04 '10 at 21:30
I believe the trick is that `IU` is not compatible with `U`. The first connection will place an `IU` on the index page because the row does not exist yet, the second will have to place a `U` on the row, which will result in waiting. – GSerg Aug 04 '10 at 22:05
I don't think there's anything stopping 2 concurrent transactions doing `SELECT 0 FROM Table1 with (updlock) WHERE PrimaryKey = x` *as long as `x` doesn't exist* (I just tested that in isolation and the 2 transactions didn't block each other) but I've bugged you enough about this now. Sorry! – Martin Smith Aug 04 '10 at 22:51
No, keep bugging, I sometimes find myself *feeling* what is correct instead of *knowing* it. This is the case. Side note: SQL Server has an interesting effect where two transactions will not block each other even if each of them tries to get an exclusive lock on the same object, *provided that no actual update occurs* (a trick to improve concurrency). We should have a nice stress testing tomorrow. – GSerg Aug 04 '10 at 23:16
2

The testing shows that two `select 'Done.' where exists(select 0 from foo_testing with(updlock) where id = 4);`, *provided `id=4` does not exist*, don't conflict with each other, which means my original answer was actually wrong. The solution is to add the `HOLDLOCK` hint. See the edited answer. Thanks for keeping me bugged :) – GSerg Aug 05 '10 at 12:01
3

There's a great explanation of why this locking is required in Daniel's answer to my (very similar) question: http://stackoverflow.com/questions/3789287/violation-of-unique-key-constraint-on-insert-where-count-0-on-sql-server-200/3791506#3791506 – Iain Sep 25 '10 at 08:35
late comment sorry. I forgot about using locks when I started with TRY/CATCH. Not sure what's best: decreasing concurrency or dealing with an exception. Anyway, +1 for thoroughness. Enjoy the badge ;-) – gbn Jun 27 '11 at 16:41

score 16 · Answer 3 · answered Aug 04 '10 at 17:45

16

You could use MERGE:

MERGE INTO Target
USING (VALUES (@primaryKey, @value1, @value2)) Source (key, value1, value2)
ON Target.key = Source.key
WHEN MATCHED THEN
    UPDATE SET value1 = Source.value1, value2 = Source.value2
WHEN NOT MATCHED BY TARGET THEN
    INSERT (Name, ReasonType) VALUES (@primaryKey, @value1, @value2)

answered Aug 04 '10 at 17:45

Chris Smith

5,326
29
29

In this case you can remove the 'WHEN MATCHED THEN' as Adam only needs to insert if missing, not upsert. – Iain Apr 21 '11 at 16:22
5

Sorry, but without adding hold lock hints to your merge statement, you will have the exact problem that the OP is concerned about. – EBarr Jun 11 '11 at 21:53
7

See [this article](http://weblogs.sqlteam.com/dang/archive/2009/01/31/UPSERT-Race-Condition-With-MERGE.aspx) for more on @EBarr's point – Martin Smith Dec 29 '11 at 16:03
1

@MartinSmith - that is the exact article I read when I ran across the issue! Thanks for the reference. – EBarr Dec 30 '11 at 02:04
The MSDN documentation states (in performance tip) that you should use insert where not exists instead of merge unless complexity is required... https://msdn.microsoft.com/en-us/library/bb510625.aspx?f=255&MSPPError=-2147217396 – Mladen Mihajlovic Jan 27 '17 at 13:44

pim · Answer 4 · 2023-07-13T12:30:34.630

To add slightly to @gbn's answer, perhaps "enhance" it. For those, like me, left feeling unsettled with what to do in the <> 2627 scenario (and no an empty CATCH is not an option). I found this little nugget from technet.

    BEGIN TRY
       INSERT etc
    END TRY
    BEGIN CATCH
        IF ERROR_NUMBER() <> 2627
          BEGIN
                DECLARE @ErrorMessage NVARCHAR(4000);
                DECLARE @ErrorSeverity INT;
                DECLARE @ErrorState INT;

                SELECT @ErrorMessage = ERROR_MESSAGE(),
                @ErrorSeverity = ERROR_SEVERITY(),
                @ErrorState = ERROR_STATE();

                    RAISERROR (
                        @ErrorMessage,
                        @ErrorSeverity,
                        @ErrorState
                    );
          END
    END CATCH

This is exactly the bit that I was left directionless in the previous answer. +1 to the both of you! — T0t3sMcG0t3s, Nov 05 '18 at 15:04

score 1 · Answer 5 · answered Aug 04 '10 at 16:59

1

I don't know if this is the "official" way, but you could try the INSERT, and fall back to UPDATE if it fails.

answered Aug 04 '10 at 16:59

Marcelo Cantos

181,030
38
327
365

Alex from Jitbit · Answer 6 · 2022-05-26T13:57:59.970

In addition to the accepted answer JFDI pattern, you probably want to ignore 2601 errors too (in addition to 2627) which is "Violation of unique index".

...
IF ERROR_NUMBER() NOT IN (2601, 2627) THROW
...

P.S. And if you're already using C# and .NET here's how you can neatly handle this without complicated SQL code using a simple C# 6.0 when statement:

try
{
    connection.Execute("INSERT INTO etc");
}
catch (SqlException ex) when (ex.Number == 2601 || ex.Number == 2627)
{
    //ignore "dup key" errors
}

By the way, here's a good read on the subject: https://michaeljswart.com/2017/07/sql-server-upsert-patterns-and-antipatterns/

score -5 · Answer 7 · edited Aug 16 '13 at 00:39

-5

I've done a similar operation in past using a different method. First, I declare a variable to hold the primary key. Then I populate that variable with the output of a select statement which looks for a record with those values. Then I do and IF statement. If primary key is null, then do insert, else, return some error code.

     DECLARE @existing varchar(10)
    SET @existing = (SELECT primaryKey FROM TABLE WHERE param1field = @param1 AND param2field = @param2)

    IF @existing is not null
    BEGIN
    INSERT INTO Table(param1Field, param2Field) VALUES(param1, param2)
    END
    ELSE
    Return 0
END

edited Aug 16 '13 at 00:39

John Saunders

160,644
26
247
397

answered Aug 04 '10 at 17:06

Marc

29
1
7

Why not just do: IF NOT EXISTS (SELECT * FROM TABLE WHERE param1field = @param1 AND param2field = @param2) BEGIN INSERT INTO Table(param1Field, param2Field) VALUES(param1, param2) END – Vidar Nordnes Aug 04 '10 at 17:29
Yeah, but that looks like it's open to concurrency issues (i.e. what if something happens on another connection between your select and your insert?) – Adam Aug 04 '10 at 17:31
2

@Adam Marc's code above isn't any better for avoiding locking issues. The only two ways to handle concurrency issues are to lock using WITH (UPDLOCK, HOLDLOCK) or to handle the insert error and convert it to an update. – ErikE Oct 02 '10 at 01:06

Only inserting a row if it's not already there

7 Answers7

Linked

Related