0

I have this TSQL code that checks if the 'sadsadsad' exists and if not inserts it into the table.

if not exists(select id from [ua_subset_composite] where ua = 'sadsadsad')
  begin
    insert into [ua_subset_composite]
    select 'sadsadsad',1,null,null,null,null
  end

My concern is that in production where there will be multiple threads running concurrently, a situation may occur that a record will slip thru between the not exists select and the insert.

I don't want to add a unique constraint on the column and wondering if I can improve this SQL code so that it will guarantee the uniqueness

Woot4Moo
  • 23,987
  • 16
  • 94
  • 151
user1939553
  • 79
  • 1
  • 8
  • 5
    It would be interesting to know why you don't want to add a `UNIQUE` constraint on a column that apparently should be unique? If duplicates will be rare, then just trying the insert and catching the resulting error if there is one can be a reasonable approach. – Pondlife Jan 09 '13 at 17:40
  • 2
    Using a unique constraint is the only sensible solution to this problem. –  Jan 09 '13 at 17:46
  • +1 for unique constraint... can I ask why you were wanting to avoid this? – Michael Fredrickson Jan 09 '13 at 17:49

5 Answers5

2

One way to address this is to use a higher level of isolation (i.e. locking). You could wrap your entire statement in a transaction and use a stricter isolation level.

For example:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

BEGIN TRANSACTION

   <your code here>

COMMIT TRANSACTION
Brian Knight
  • 4,970
  • 28
  • 34
0

You could implement a locking strategy on your database. You have the choice of pessimistic:

when you lock the record for your exclusive use until you have finished with it. It has much better integrity than optimistic locking but requires you to be careful with your application design to avoid deadlocks.

or optimistic:

where you read a record, take note of a version number and check that the version hasn't changed before you write the record back. When you write the record back you filter the update on the version to make sure it's atomic. (i.e. hasn't been updated between when you check the version and write the record to the disk) and update the version in one hit.

If the record is dirty (i.e. different version to yours) you abort the transaction and the user can re-start it.

source

Community
  • 1
  • 1
Woot4Moo
  • 23,987
  • 16
  • 94
  • 151
0

When you perform the select, place a updlock, holdlock on the range being selected:

begin transaction

if not exists(
    select id 
    from [ua_subset_composite] with (updlock, holdlock) 
    where ua = 'sadsadsad')
  begin
    insert into [ua_subset_composite]
    select 'sadsadsad',1,null,null,null,null
  end

commit

The holdlock, equivalent to the serializable isolation level, will have the following effect:

  • Statements cannot read data that has been modified but not yet committed by other transactions.

  • No other transactions can modify data that has been read by the current transaction until the current transaction completes.

  • Other transactions cannot insert new rows with key values that would fall in the range of keys read by any statements in the current transaction until the current transaction completes.

Range locks are placed in the range of key values that match the search conditions of each statement executed in a transaction. This blocks other transactions from updating or inserting any rows that would qualify for any of the statements executed by the current transaction. This means that if any of the statements in a transaction are executed a second time, they will read the same set of rows. The range locks are held until the transaction completes. This is the most restrictive of the isolation levels because it locks entire ranges of keys and holds the locks until the transaction completes. Because concurrency is lower, use this option only when necessary.

The updlock is needed in addition to the holdlock... by adding the updlock we prevent a separate process from performing its own select with (updlock, holdlock) statement on the same range at the same time.

Michael Fredrickson
  • 36,839
  • 5
  • 92
  • 109
  • what does it do to those attempting to insert? Bomb out their transaction? – Woot4Moo Jan 09 '13 at 17:33
  • @Woot4Moo Other processes trying to insert will be blocked, and wait for the blocking process to complete and release the lock before continuing. Won't bomb out necessarily, but will definitely reduce concurrency. – Michael Fredrickson Jan 09 '13 at 17:34
  • Ok, to clarify though, doesn't this cause dirty inserts/updates? I could be wrong, just trying to get some clarity. – Woot4Moo Jan 09 '13 at 17:35
  • @Woot4Moo Hmmm... you're right... the read won't block the other read, allowing the insert to proceed to add a duplicated value. Let me update my answer. – Michael Fredrickson Jan 09 '13 at 17:39
0

This is what I have ended up doing

insert into [ua_subset_composite]  WITH (TABLOCKX) (ua, os)
select @r, 1 
where not exists (select 1 from [ua_subset_composite] nolock where ua = @r

To test the code I ran this code concurrently from multiple windows

declare @r nvarchar(30);
while(1=1)
begin

set @r =  convert(nvarchar(30),getdate(),21 )
insert into [ua_subset_test]  WITH (TABLOCKX) (ua, os)
select @r, 1 
where not exists (select 1 from [ua_subset_test] nolock where ua = @r

)
end
user1939553
  • 79
  • 1
  • 8
-1

Unfortunately, none of the above answers is correct. Beware of any "locking" solution that starts BEGIN TRAN SELECT. Yes, if the isolation level is SERIALIZABLE, SELECT creates locks that prevent other processes from updating the selected data. But what if no data are selected? What's to lock?

IOW, BEGIN TRAN sets up a race condition:

/* spid */
 /* 1 */   SELECT ... -- returns no rows
 /* 2 */   SELECT ... -- returns no rows
 /* 1 */   INSERT ... -- whew! 
 /* 2 */   INSERT ... -- error

To read before writing (say, to present data to the user), there's the special timestamp data type. In your case, though, it's just an insert. Use an atomic transaction, i.e. a single statement:

insert into [ua_subset_composite] (column1, column2)
values ('sadsadsad', 1)
where not exists (
    select 1 from ua_subset_composite 
    where column1 = 'sadsadsad'
)

The server guarantees the row either is or is not inserted. The locking is done for you, where it needs to be, for the shortest possible time, by people who know how. :-)

I don't want to add a unique constraint

Well, you probably should, you know. The above code will prevent trying to add a nonunique value, and avoid an error message. A unique constraint will prevent someone less careful from succeeding.

James K. Lowden
  • 7,574
  • 1
  • 16
  • 31
  • "But what if no data are selected? What's to lock?" `SERIALIZABLE` will issue a range lock... even if the data isn't there at the time of the select, the *criteria* will remained locked for the duration of the transaction. Your answer has the exact same race condition as the OPs original query, even though the select statement is part of the insert, this doesn't prevent two processes from running the select at the same time. – Michael Fredrickson Jan 10 '13 at 23:05
  • Please. A subquery does not create two statements; it's one. It's atomic. It's correct. It's standard. And it's fast. Furthermore, I don't believe `serializable` works as you say, cf. http://blogs.msdn.com/b/sqlserverstorageengine/archive/2006/05/26/range-locks.aspx. It uses range locks, which depend on indexes. – James K. Lowden Jan 16 '13 at 07:10
  • Yes, but the `select` portion of an `insert into... select` statement isn't a sub-query. It is a standard `select` statement that runs under the current isolation level... if the isolation level is `read committed`, then the `select` statement will only issue shared locks, even though it is used in conjunction with an insert statement. [Please see this question and answer for additional details and test cases on this topic.](http://stackoverflow.com/questions/1994771/in-tsql-is-an-insert-with-a-select-statement-safe-in-terms-of-concurrency) – Michael Fredrickson Jan 16 '13 at 16:57
  • Second, you're right that indexes are required for `SERIALIZABLE` to issue range locks, but in the absence of an index on the search criteria, a table lock is issued instead. In both cases, data that was searched for but not yet in the table will be prevented from being added. This is how the `SERIALIZABLE` isolation level is able to prevent phantom reads. – Michael Fredrickson Jan 16 '13 at 16:59