Inserting records into DB table in multithreaded environment

Question

I have this TSQL code that checks if the 'sadsadsad' exists and if not inserts it into the table.

if not exists(select id from [ua_subset_composite] where ua = 'sadsadsad')
  begin
    insert into [ua_subset_composite]
    select 'sadsadsad',1,null,null,null,null
  end

My concern is that in production where there will be multiple threads running concurrently, a situation may occur that a record will slip thru between the not exists select and the insert.

I don't want to add a unique constraint on the column and wondering if I can improve this SQL code so that it will guarantee the uniqueness

It would be interesting to know why you don't want to add a `UNIQUE` constraint on a column that apparently should be unique? If duplicates will be rare, then just trying the insert and catching the resulting error if there is one can be a reasonable approach. — Pondlife, Jan 09 '13 at 17:40
Using a unique constraint is the only sensible solution to this problem. — , Jan 09 '13 at 17:46
+1 for unique constraint... can I ask why you were wanting to avoid this? — Michael Fredrickson, Jan 09 '13 at 17:49

score 2 · Answer 1 · answered Jan 09 '13 at 17:39

2

One way to address this is to use a higher level of isolation (i.e. locking). You could wrap your entire statement in a transaction and use a stricter isolation level.

For example:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

BEGIN TRANSACTION

   <your code here>

COMMIT TRANSACTION

answered Jan 09 '13 at 17:39

Brian Knight

4,970
28
34

Agree answer to the stated question given a unique constraint is not an option. – paparazzo Jan 09 '13 at 18:12
1

`SERIALIZABLE` by itself will result in a deadlock if two processes run at the same time... the `updlock` is needed as well... – Michael Fredrickson Jan 10 '13 at 23:06

score 0 · Answer 2 · edited May 23 '17 at 11:59

You could implement a locking strategy on your database. You have the choice of pessimistic:

when you lock the record for your exclusive use until you have finished with it. It has much better integrity than optimistic locking but requires you to be careful with your application design to avoid deadlocks.

or optimistic:

where you read a record, take note of a version number and check that the version hasn't changed before you write the record back. When you write the record back you filter the update on the version to make sure it's atomic. (i.e. hasn't been updated between when you check the version and write the record to the disk) and update the version in one hit.

If the record is dirty (i.e. different version to yours) you abort the transaction and the user can re-start it.

source

Michael Fredrickson · Answer 3 · 2013-01-09T17:43:16.560

When you perform the select, place a updlock, holdlock on the range being selected:

begin transaction

if not exists(
    select id 
    from [ua_subset_composite] with (updlock, holdlock) 
    where ua = 'sadsadsad')
  begin
    insert into [ua_subset_composite]
    select 'sadsadsad',1,null,null,null,null
  end

commit

The holdlock, equivalent to the serializable isolation level, will have the following effect:

Statements cannot read data that has been modified but not yet committed by other transactions.

No other transactions can modify data that has been read by the current transaction until the current transaction completes.

Other transactions cannot insert new rows with key values that would fall in the range of keys read by any statements in the current transaction until the current transaction completes.

Range locks are placed in the range of key values that match the search conditions of each statement executed in a transaction. This blocks other transactions from updating or inserting any rows that would qualify for any of the statements executed by the current transaction. This means that if any of the statements in a transaction are executed a second time, they will read the same set of rows. The range locks are held until the transaction completes. This is the most restrictive of the isolation levels because it locks entire ranges of keys and holds the locks until the transaction completes. Because concurrency is lower, use this option only when necessary.

The updlock is needed in addition to the holdlock... by adding the updlock we prevent a separate process from performing its own select with (updlock, holdlock) statement on the same range at the same time.

what does it do to those attempting to insert? Bomb out their transaction? — Woot4Moo, Jan 09 '13 at 17:33
@Woot4Moo Other processes trying to insert will be blocked, and wait for the blocking process to complete and release the lock before continuing. Won't bomb out necessarily, but will definitely reduce concurrency. — Michael Fredrickson, Jan 09 '13 at 17:34
Ok, to clarify though, doesn't this cause dirty inserts/updates? I could be wrong, just trying to get some clarity. — Woot4Moo, Jan 09 '13 at 17:35
@Woot4Moo Hmmm... you're right... the read won't block the other read, allowing the insert to proceed to add a duplicated value. Let me update my answer. — Michael Fredrickson, Jan 09 '13 at 17:39

score 0 · Accepted Answer · answered Jan 10 '13 at 13:32

This is what I have ended up doing

insert into [ua_subset_composite]  WITH (TABLOCKX) (ua, os)
select @r, 1 
where not exists (select 1 from [ua_subset_composite] nolock where ua = @r

To test the code I ran this code concurrently from multiple windows

declare @r nvarchar(30);
while(1=1)
begin

set @r =  convert(nvarchar(30),getdate(),21 )
insert into [ua_subset_test]  WITH (TABLOCKX) (ua, os)
select @r, 1 
where not exists (select 1 from [ua_subset_test] nolock where ua = @r

)
end

score -1 · Answer 5 · answered Jan 10 '13 at 05:01

-1

Unfortunately, none of the above answers is correct. Beware of any "locking" solution that starts BEGIN TRAN SELECT. Yes, if the isolation level is SERIALIZABLE, SELECT creates locks that prevent other processes from updating the selected data. But what if no data are selected? What's to lock?

IOW, BEGIN TRAN sets up a race condition:

/* spid */
 /* 1 */   SELECT ... -- returns no rows
 /* 2 */   SELECT ... -- returns no rows
 /* 1 */   INSERT ... -- whew! 
 /* 2 */   INSERT ... -- error

To read before writing (say, to present data to the user), there's the special timestamp data type. In your case, though, it's just an insert. Use an atomic transaction, i.e. a single statement:

insert into [ua_subset_composite] (column1, column2)
values ('sadsadsad', 1)
where not exists (
    select 1 from ua_subset_composite 
    where column1 = 'sadsadsad'
)

The server guarantees the row either is or is not inserted. The locking is done for you, where it needs to be, for the shortest possible time, by people who know how. :-)

I don't want to add a unique constraint

Well, you probably should, you know. The above code will prevent trying to add a nonunique value, and avoid an error message. A unique constraint will prevent someone less careful from succeeding.

answered Jan 10 '13 at 05:01

James K. Lowden

7,574
1
16
31

"But what if no data are selected? What's to lock?" `SERIALIZABLE` will issue a range lock... even if the data isn't there at the time of the select, the *criteria* will remained locked for the duration of the transaction. Your answer has the exact same race condition as the OPs original query, even though the select statement is part of the insert, this doesn't prevent two processes from running the select at the same time. – Michael Fredrickson Jan 10 '13 at 23:05
Please. A subquery does not create two statements; it's one. It's atomic. It's correct. It's standard. And it's fast. Furthermore, I don't believe `serializable` works as you say, cf. http://blogs.msdn.com/b/sqlserverstorageengine/archive/2006/05/26/range-locks.aspx. It uses range locks, which depend on indexes. – James K. Lowden Jan 16 '13 at 07:10
Yes, but the `select` portion of an `insert into... select` statement isn't a sub-query. It is a standard `select` statement that runs under the current isolation level... if the isolation level is `read committed`, then the `select` statement will only issue shared locks, even though it is used in conjunction with an insert statement. [Please see this question and answer for additional details and test cases on this topic.](http://stackoverflow.com/questions/1994771/in-tsql-is-an-insert-with-a-select-statement-safe-in-terms-of-concurrency) – Michael Fredrickson Jan 16 '13 at 16:57
Second, you're right that indexes are required for `SERIALIZABLE` to issue range locks, but in the absence of an index on the search criteria, a table lock is issued instead. In both cases, data that was searched for but not yet in the table will be prevented from being added. This is how the `SERIALIZABLE` isolation level is able to prevent phantom reads. – Michael Fredrickson Jan 16 '13 at 16:59

Inserting records into DB table in multithreaded environment

5 Answers5