A thread safe way to check if a row exists before inserting - Is my code correct?

Question

I have a table "INSERTIF" which looks like this -

id  value
S1  s1rocks
S2  s2rocks
S3  s3rocks

Before inserting a row into this table, I wanted to check if the given id exists or not. If it does not exist, then insert. Else, just update the value. I want to do this in a thread safe way. Can you tell me if my code is correct or not ? I tried it and it worked but, I want to be sure that I am not missing anything like performance issues.

EDIT 1- I want to use this code to insert millions of rows, one at a time. Each insert statement is wrapped around the code I have shown.

EDIT 2 - I do not want to use the UPDATE part of my code, only inserting is enough.

I do NOT want to use MERGE because it works with only SQL server 2008 and above

Thanks.

Code -

-- no check insert 
INSERT INTO INSERTIF(ID,VALUE)
VALUES('S1', 's1doesNOTrock')

--insert with checking 

begin tran /* default read committed isolation level is fine */
if not exists 
(select * from INSERTIF with (updlock, rowlock, holdlock) 
where ID = 'S1')
BEGIN
INSERT INTO INSERTIF(ID,VALUE)
VALUES('S1', 's1doesNOTrock')
END
else
/* update */
UPDATE INSERTIF
SET VALUE = 's1doesNOTrock'
WHERE ID = 'S1'
commit /* locks are released here */

Code to create table -

CREATE TABLE [dbo].[INSERTIF](
    [id] [varchar](50) NULL,
    [value] [varchar](50) NULL
)
INSERT [dbo].[INSERTIF] ([id], [value]) VALUES (N'S1', N's1rocks')
INSERT [dbo].[INSERTIF] ([id], [value]) VALUES (N'S2', N's2rocks')
INSERT [dbo].[INSERTIF] ([id], [value]) VALUES (N'S3', N's3rocks')

@JonathanLeffler - Please see my edit. I do NOT want to use MERGE. I Just don't. — Steam, Feb 21 '14 at 21:51
Doing this for a million rows 'one at a time' is insane. See http://stackoverflow.com/questions/12621241/can-i-use-the-merge-statement-in-sql-server-2005. Insert into a temp table and use the accepted answer. — Dean Ward, Feb 21 '14 at 21:55
OK; SQL Server 2005 is almost a decade old; maybe your users should be upgrading to at least the 6-year old version? Why cripple your modern systems so that you can still work with the archaic ones? The MERGE statement might give you an order of magnitude performance benefit — especially if the code you've got involves schlepping data from the DBMS to a client and back again. — Jonathan Leffler, Feb 21 '14 at 21:59
@JonathanLeffler - I cannot make those decisions. Is there a way to make my code efficient for millions of inserts ? — Steam, Feb 21 '14 at 22:20
Bad luck. Let those who can make such decisions know that the SQL Server 2005 customers are hurting the 2008 and more recent customers by significant amounts. No, there isn't a way to make your code efficient for millions of operations; not even if it is all embedded in a stored procedure. It'll be more efficient in a stored procedure (it cuts out the round trips between DBMS and client, which will be a major cause of performance problems otherwise), but still not a patch on the MERGE statement. — Jonathan Leffler, Feb 21 '14 at 22:28
One possible reason for down-votes (and I'm not a down-voter) is that you changed the question significantly after asking the first version; that tends to annoy people. If you've got constraints like 'must work in SQL Server 2005' that we need to know about, then you need to state them up front. Otherwise, people give good answers to the question -- only to find that the question you asked isn't the question that you needed answered. — Jonathan Leffler, Feb 22 '14 at 00:00

score 7 · Accepted Answer · edited May 23 '17 at 12:32

Your question is about the thread-safety of your code. Succinctly, no — it is not thread-safe. (But see below where isolation is discussed.)

You have a (smallish) window of vulnerability because of the TOCTOU (Time of Check, Time of Use) issue between your 'not exists' SELECT and the corresponding action. Assuming you have a unique (primary) key constraint on the id column, you should use the 'Easier to Ask Forgiveness than Permission' paradigm rather than the 'Look Before You Leap' paradigm (see EAFP vs LBYL).

That means you should determine which of two operation sequences you're going to use:

INSERT, but UPDATE if it fails.
UPDATE, but INSERT if no rows are updated.

Either works. If the work will be mostly insert and occasionally update, then 1 is better than 2; if the work will be mostly update with the occasional insert, then 2 is better than 1. You might even work adaptively; keep a track of what happened in the last N rows (where N might be as few as 5 or as many as 500) and use a heuristic to decide which to try on the new row. There still could be a problem if the INSERT fails (because the row existed) but the UPDATE updates nothing (because someone deleted the row after the insert failed). Similarly, there could still be a problem with UPDATE and INSERT too (no row existed, but one was inserted).

Note that the INSERT option depends wholly on a unique constraint to ensure that duplicate rows are not inserted; the UPDATE option is more reliable.

You also need to consider your isolation level — which might change the original answer. If your isolation is high enough to ensure that after the 'not exists' SELECT is executed, no-one else will be able to insert the row that you determined did not exist, then you may be OK. That gets into some nitty-gritty understanding of your DBMS (and I'm not an SQL Server expert).

You'll need to think about transaction boundaries, too; how big a transaction would be appropriate, especially if the source data has a million entries.

I don't know anything about Isolation levels. I am hearing about it for the first time. Right now , the UPDATE option is not necessary for me. Should I have a transaction for each row or for a batch of rows (while checking the existenc of each row i want to insert) ? Thanks. — Steam, Feb 21 '14 at 22:27
It's a balancing act; it depends on the requirements. I'd be inclined to code a configurable batch size, starting at maybe 64, maybe 1024 operations, and then see how it works. — Jonathan Leffler, Feb 21 '14 at 22:29
It is thread safe due to the three locking hints. Specifically `holdlock` [gives serializable semantics](http://technet.microsoft.com/en-us/library/ms187373.aspx) and prevents phantoms. I agree not suitable approach for inserting such a large number of rows though. — Martin Smith, Feb 21 '14 at 22:42
@MartinSmith - Would it be better if I just put a UNIQUE constraint on the ID column of the table ? I don't want to do any updates. Only inserts. — Steam, Feb 21 '14 at 23:01
@blasto Well that changes things significantly. If you don't want to do an upsert and just want to ignore any rows that already exist you can use a unique constraint with `ignore_dup_key` set to on and just insert them. Any rows with duplicate key values will be skipped and just the non dupes inserted. — Martin Smith, Feb 21 '14 at 23:13
@blasto: your question still asks about UPDATE and INSERT, but your comments here indicate that you've changed your mind about what you need to handle. It is difficult for people to help you when you change the rules. If you don't have a unique constraint on the ID column, the column isn't guaranteed to be an identifier. Always put a unique constraint where a unique constraint needs to be enforced, at least until you've measured performance and can show that it is a serious performance hit and you're willing to rewrite all the code that needs it to be unique to accommodate it being non-unique. — Jonathan Leffler, Feb 22 '14 at 00:05

score 1 · Answer 2 · answered Feb 21 '14 at 21:51

This technique is typically called UPSERT. It can be done in SQL Server using MERGE. It works like this:

MERGE INTO A_Table
USING 
    (SELECT 'data_searched' AS Search_Col) AS SRC
    -- Search predicates
    --
    ON A_Table.Data = SRC.Search_Col
WHEN MATCHED THEN
    -- Update part of the 'UPSERT'
    --
    UPDATE SET
        Data = 'data_searched_updated'
WHEN NOT MATCHED THEN
    -- INSERT part of the 'UPSERT'
    --
    INSERT (Data)
    VALUES (SRC.Search_Col);

Also see http://www.sergeyv.com/blog/archive/2010/09/10/sql-server-upsert-equivalent.aspx

EDIT: I see you're using an older SQL Server. In that case you must use multiple statements.

I don't want any MERGE or UPSERT. Can you please tell me if my code is okay or needs modification ? Thanks. — Steam, Feb 21 '14 at 21:53

A thread safe way to check if a row exists before inserting - Is my code correct?

2 Answers2