What is the best practice for inserting a record if it doesn't already exist?

Question

I know at least three ways to insert a record if it doesn't already exist in a table:

The first one is using if not exist:

IF NOT EXISTS(select 1 from table where <condition>)
    INSERT...VALUES

The second one is using merge:

MERGE table AS target  
USING (SELECT values) AS source 
ON (condition)  
WHEN NOT MATCHED THEN  
INSERT ... VALUES ...

The third one is using insert...select:

INSERT INTO table (<values list>)
SELECT <values list>
WHERE NOT EXISTS(select 1 from table where <condition>)

But which one is the best?

The first option seems to be not thread-safe, as the record might be inserted between the select statement in the if and the insert statement that follows, if two or more users try to insert the same record.

As for the second option, merge seems to be an overkill for this, as the documentation states:

Performance Tip: The conditional behavior described for the MERGE statement works best when the two tables have a complex mixture of matching characteristics. For example, inserting a row if it does not exist, or updating the row if it does match. When simply updating one table based on the rows of another table, improved performance and scalability can be achieved with basic INSERT, UPDATE, and DELETE statements.

So I think the third option is the best for this scenario (only insert the record if it doesn't already exist, no need to update if it does), but I would like to know what SQL Server experts think.

Please note that after the insert, I'm not interested to know whether the record was already there or whether it's a brand new record, I just need it to be there so that I can carry on with the rest of the stored procedure.

If you performed option #1 within a transaction, would you still see a problem there? — Tim Biegeleisen, Jul 21 '16 at 07:08
I would still have to wrap it with a try...catch, wouldn't I? I'm hoping to avoid the use of catch when I can write my code without it... — Zohar Peled, Jul 21 '16 at 07:12
Normally this should NOT happen by design that same primary keys go together (e.g. by keeping key generation automatic), if it does, don't you think it is better to tell the user that the key which you are trying to insert is already there? Therefore using 1st approach with a try...catch should be a better option — sallushan, Jul 21 '16 at 07:27
@sallushan this is assuming there is a surrogate key in the table, and in that case, there should also be at least a unique constraint on the columns that makes up the natural key. so an identity column doesn't solve this problem. — Zohar Peled, Jul 21 '16 at 07:37
@ZoharPeled, the auto key generation is just a thought, my main point for you was to show the end user (or log it) that the key which system is trying to insert is already generated. And this can be done by using your your 1st approach with a try...catch (a simple option imo). Other options may NOT trigger any error (unless you put rows affected check) — sallushan, Jul 21 '16 at 07:43
In my case I'm not interested if the record already existed or if it's new, I only need to have a record in the table, so indicating to the user whether the record is already there or was it created just now is irrelevant for me. — Zohar Peled, Jul 21 '16 at 07:46
All of these approaches are valid and all of them can fail in multi-user environment. To make them safe and to avoid race condition read http://weblogs.sqlteam.com/dang/archive/2007/10/28/Conditional-INSERTUPDATE-Race-Condition.aspx and http://weblogs.sqlteam.com/dang/archive/2009/01/31/UPSERT-Race-Condition-With-MERGE.aspx — Vladimir Baranov, Jul 27 '16 at 08:15
@VladimirBaranov Thanks for the links, Will read them as soon as I can. — Zohar Peled, Jul 27 '16 at 08:31
@VladimirBaranov Thanks, I've read Dan's blog posts and seems like you are correct. If you could post an answer addressing this issue I would love to upvote and accept it. — Zohar Peled, Jul 28 '16 at 06:08
@ZoharPeled, sorry, I'm sick at the moment. You'd better self-answer your question. — Vladimir Baranov, Jul 28 '16 at 08:02
@VladimirBaranov: No rush, you deserve those reputation points. You can answer it whenever you wish. Hope you feel better soon. — Zohar Peled, Jul 28 '16 at 08:16

score 0 · Answer 1 · answered Jul 27 '16 at 05:30

When you need to guarantee the uniqueness of records on a condition that can not to be expressed by a UNIQUE or PRIMARY KEY constraint, you indeed need to make sure that the check for existence and insert are being done in one transaction. You can achieve this by either:

Using one SQL statement performing the check and the insert (your third option)
Using a transaction with the appropriate isolation level

There is a fourth way though that will help you better structure your code and also make it work in situations where you need to process a batch of records at once. You can create a TABLE variable or a temporary table, insert all of the records that need to be inserted in there and then write the INSERT, UPDATE and DELETE statements based on this variable.

Below is (pseudo)code demonstrating this approach:

-- Logic to create the data to be inserted if necessary

DECLARE @toInsert TABLE (idCol INT PRIMARY KEY,dataCol VARCHAR(MAX))
INSERT INTO @toInsert (idCol,dataCol) VALUES (1,'row 1'),(2,'row 2'),(3,'row 3')

-- Logic to insert the data

INSERT INTO realTable (idCol,dataCol)
SELECT TI.*
FROM @toInsert TI
WHERE NOT EXISTS (SELECT 1 FROM realTable RT WHERE RT.dataCol=TI.dataCol)

In many situations I use this approach as it makes the TSQL code easier to read, possible to refactor and apply unit tests to.

Thanks for your answer, however I don't find it very helpful: I never said I don't have a primary key - I do have one. The problem is not to guarantee uniqueness of the record, it's to avoid the use of `Try...catch` when possible. Also, in this specific case, I only need to insert a single row, so creating an extra step in the way as a table variable is not a solution for this question. — Zohar Peled, Jul 27 '16 at 05:43

Zohar Peled · Accepted Answer · 2018-10-15T13:11:29.340

Following Vladimir Baranov's comment, reading Dan Guzman's blog posts about Conditional INSERT/UPDATE Race Condition and “UPSERT” Race Condition With MERGE, seems like all three options suffers from the same drawbacks in a multi-user environment.

Eliminating the merge option as an overkill, we are left with options 1 and 3.

Dan's proposed solution is to use an explicit transaction and add lock hints to the select to avoid race condition.

This way, option 1 becomes:

BEGIN TRANSACTION
IF NOT EXISTS(select 1 from table WITH (UPDLOCK, HOLDLOCK) where <condition>)
BEGIN
    INSERT...VALUES
END
COMMIT TRANSACTION

and option 2 becomes:

BEGIN TRANSACTION
INSERT INTO table (<values list>)
SELECT <values list>
WHERE NOT EXISTS(select 1 from table WITH (UPDLOCK, HOLDLOCK)where <condition>)
COMMIT TRANSACTION

Of course, in both options there need to be some error handling - every transaction should use a try...catch so that we can rollback the transaction in case of an error.

That being said, I think the 3rd option is probably my personal favorite, but I don't think there should be a difference.

Update

Following a conversation I've had with Aaron Bertrand in the comments of some other question - I'm not entirely convinced that using ISOLATION LEVEL is a better solution than individual query hints, but at least that's another option to consider:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

BEGIN TRANSACTION;
INSERT INTO table (<values list>)
SELECT <values list>
WHERE NOT EXISTS(select 1 from table where <condition>);
COMMIT TRANSACTION;

What is the best practice for inserting a record if it doesn't already exist?

2 Answers2

Update

Linked

Related