86

Is a statement in SQL Server ACID?

What I mean by that

Given a single T-SQL statement, not wrapped in a BEGIN TRANSACTION / COMMIT TRANSACTION, are the actions of that statement:

  • Atomic: either all of its data modifications are performed, or none of them is performed.
  • Consistent: When completed, a transaction must leave all data in a consistent state.
  • Isolated: Modifications made by concurrent transactions must be isolated from the modifications made by any other concurrent transactions.
  • Durable: After a transaction has completed, its effects are permanently in place in the system.

The reason I ask

I have a single statement in a live system that appears to be violating the rules of the query.

In effect my T-SQL statement is:

--If there are any slots available, 
--then find the earliest unbooked transaction and mark it booked
UPDATE Transactions
SET Booked = 1
WHERE TransactionID = (
   SELECT TOP 1 TransactionID
   FROM Slots
      INNER JOIN Transactions t2
      ON Slots.SlotDate = t2.TransactionDate
   WHERE t2.Booked = 0 --only book it if it's currently unbooked
   AND Slots.Available > 0 --only book it if there's empty slots
   ORDER BY t2.CreatedDate)

Note: But a simpler conceptual variant might be:

--Give away one gift, as long as we haven't given away five
UPDATE Gifts
SET GivenAway = 1
WHERE GiftID = (
   SELECT TOP 1 GiftID
   FROM Gifts
   WHERE g2.GivenAway = 0
   AND (SELECT COUNT(*) FROM Gifts g2 WHERE g2.GivenAway = 1) < 5
   ORDER BY g2.GiftValue DESC
)

In both of these statements, notice that they are single statements (UPDATE...SET...WHERE).

There are cases where the wrong transaction is being "booked"; it's actually picking a later transaction. After staring at this for 16 hours, I'm stumped. It's as though SQL Server is simply violating the rules.

I wondered what if the results of the Slots view is changing before the update happens? What if SQL Server is not holding SHARED locks on the transactions on that date? Is it possible that a single statement can be inconsistent?

So I decided to test it

I decided to check if the results of sub-queries, or inner operations, are inconsistent. I created a simple table with a single int column:

CREATE TABLE CountingNumbers (
   Value int PRIMARY KEY NOT NULL
)

From multiple connections, in a tight loop, I call the single T-SQL statement:

INSERT INTO CountingNumbers (Value)
SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers

In other words the pseudo-code is:

while (true)
{
    ADOConnection.Execute(sql);
}

And within a few seconds I get:

Violation of PRIMARY KEY constraint 'PK__Counting__07D9BBC343D61337'. 
Cannot insert duplicate key in object 'dbo.CountingNumbers'. 
The duplicate value is (1332)

Are statements atomic?

The fact that a single statement wasn't atomic makes me wonder if single statements are atomic?

Or is there a more subtle definition of statement, that differs from (for example) what SQL Server considers a statement:

enter image description here

Does this fundamentally means that within the confines of a single T-SQL statement, SQL Server statements are not atomic?

And if a single statement is atomic, what accounts for the key violation?

From within a stored procedure

Rather than a remote client opening n connections, I tried it with a stored procedure:

CREATE procedure [dbo].[DoCountNumbers] AS

SET NOCOUNT ON;

DECLARE @bumpedCount int
SET @bumpedCount = 0

WHILE (@bumpedCount < 500) --safety valve
BEGIN
SET @bumpedCount = @bumpedCount+1;

PRINT 'Running bump '+CAST(@bumpedCount AS varchar(50))

INSERT INTO CountingNumbers (Value)
SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers

IF (@bumpedCount >= 500)
BEGIN
    PRINT 'WARNING: Bumping safety limit of 500 bumps reached'
END
END

PRINT 'Done bumping process'

and opened 5 tabs in SSMS, pressed F5 in each, and watched as they too violated ACID:

Running bump 414
Msg 2627, Level 14, State 1, Procedure DoCountNumbers, Line 14
Violation of PRIMARY KEY constraint 'PK_CountingNumbers'. 
Cannot insert duplicate key in object 'dbo.CountingNumbers'. 
The duplicate key value is (4414).
The statement has been terminated.

So the failure is independent of ADO, ADO.net, or none of the above.

For 15 years i've been operating under the assumption that a single statement in SQL Server is consistent; and the only

What about TRANSACTION ISOLATION LEVEL xxx?

For different variants of the SQL batch to execute:

  • default (read committed): key violation

    INSERT INTO CountingNumbers (Value)
    SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers
    
  • default (read committed), explicit transaction: no error key violation

    BEGIN TRANSACTION
    INSERT INTO CountingNumbers (Value)
    SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers
    COMMIT TRANSACTION
    
  • serializable: deadlock

    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
    BEGIN TRANSACTION
    INSERT INTO CountingNumbers (Value)
    SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers
    COMMIT TRANSACTION
    SET TRANSACTION ISOLATION LEVEL READ COMMITTED
    
  • snapshot (after altering database to enable snapshot isolation): key violation

    SET TRANSACTION ISOLATION LEVEL SNAPSHOT
    BEGIN TRANSACTION
    INSERT INTO CountingNumbers (Value)
    SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers
    COMMIT TRANSACTION
    SET TRANSACTION ISOLATION LEVEL READ COMMITTED
    

Bonus

  • Microsoft SQL Server 2008 R2 (SP2) - 10.50.4000.0 (X64)
  • Default transaction isolation level (READ COMMITTED)

Turns out every query I've ever written is broken

This certainly changes things. Every update statement I've ever written is fundamentally broken. E.g.:

--Update the user with their last invoice date
UPDATE Users 
SET LastInvoiceDate = (SELECT MAX(InvoiceDate) FROM Invoices WHERE Invoices.uid = Users.uid)

Wrong value; because another invoice could be inserted after the MAX and before the UPDATE. Or an example from BOL:

UPDATE Sales.SalesPerson
SET SalesYTD = SalesYTD + 
    (SELECT SUM(so.SubTotal) 
     FROM Sales.SalesOrderHeader AS so
     WHERE so.OrderDate = (SELECT MAX(OrderDate)
                           FROM Sales.SalesOrderHeader AS so2
                           WHERE so2.SalesPersonID = so.SalesPersonID)
     AND Sales.SalesPerson.BusinessEntityID = so.SalesPersonID
     GROUP BY so.SalesPersonID);

without exclusive holdlocks, the SalesYTD is wrong.

How have I been able to do anything all these years.

Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
Ian Boyd
  • 246,734
  • 253
  • 869
  • 1,219
  • What exactly do you mean with the *"There are cases where the wrong transaction is being "booked"; it's actually picking a later transaction."* ? – ypercubeᵀᴹ Jan 30 '14 at 22:37
  • 1
    Can you get it to fail under SERIALIZABLE? Would be interested to see the actual execution plan from a successful update, understand the indexes on the table. – Aaron Bertrand Jan 30 '14 at 22:39
  • @AaronBertrand i can't test it on the "real" system, because i would be affecting real people. I cannot reproduce the problem in our test system, probably because of low concurrency. Converting my toy `CountingNumbers` table to use `SET TRANSACTION ISOLATION LEVEL SERIALIZABLE` causes deadlock. – Ian Boyd Jan 30 '14 at 22:44
  • @ypercube You'll notice in the first update statement i `SELECT TOP 1 Transaction ORDER BY CreatedDate`. In the live environment, sometimes, this earlier transaction is passed over, or mistakenly select a later one (e.g. you put in your stock buy at 9:00am, but someone who put in their buy at 9:20am gets it). – Ian Boyd Jan 30 '14 at 22:46
  • How can you be sure that the earlier transaction was fully committed first? At what point is the timestamp associated with the row decided? If one transaction takes half a second longer to commit, the other one could have started first. – Aaron Bertrand Jan 30 '14 at 22:47
  • So then the fallacy here is that your test satisfies the Isolation part of ACID where it doesn't really? http://hassanszone.wordpress.com/2009/03/04/acid-properties-of-sqlserver-2005/ – billinkc Jan 30 '14 at 22:49
  • 5
    `INSERT INTO CountingNumbers (Value) SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers` isn't safe under UNCOMMITTED, COMMITTED, REPEATABLE READ and SERIALIZABLE because it requires `S` locks. And an `S` lock is compatible with another `S` lock. Other concurrent statements/transactions can still read the same rows. To be safe this approach needs SERIALIZABLE isolation level plus `X` locks or `X` locks + HOLDLOCK table hints: `INSERT INTO CountingNumbers (Value) SELECT ISNULL(MAX(Value), 0)+1 FROM CountingNumbers WITH(XLOCK, HOLDLOCK)`. – Bogdan Sahlean Jan 30 '14 at 23:01
  • 5
    You seem to be making the common mistake of confusing atomic and isolation. Atomic just means succeeds or fails as a unit (transaction all committed or all rolled back). It doesn't say anything about the visibility of changes from concurrent transactions. – Martin Smith Jan 31 '14 at 14:11
  • 3
    @BogdanSahlean - No it refers to the question title. Neither atomicity nor consistency promise what seems to be assumed. Atomic: Just means succeeds or fails as a unit and Consistent: No constraints etc violated. – Martin Smith Jan 31 '14 at 16:05
  • @BogdanSahlean How optimistic locking works then with `READ COMMITTED`? `UPDATE T SET F = 1, V = 2 WHERE V = 1` and `UPDATE T SET F = 2, V = 2 WHERE V = 1`. What happens if both statements executes at the same time? As I understand the answer to this question, that means there could be data loss in this case, since V could change as the statement executes but the change wouldn't be read? – plalx Nov 19 '14 at 14:45
  • Your question has a sentence fragment that I'm not sure how I would fix with an edit (other than removing it): "For 15 years i've been operating under the assumption that a single statement in SQL Server is consistent; and the only " – Jon Schneider Jul 28 '16 at 14:38

3 Answers3

25

I've been operating under the assumption that a single statement in SQL Server is consistent

That assumption is wrong. The following two transactions have identical locking semantics:

STATEMENT

BEGIN TRAN; STATEMENT; COMMIT

No difference at all. Single statements and auto-commits do not change anything.

So merging all logic into one statement does not help (if it does, it was by accident because the plan changed).

Let's fix the problem at hand. SERIALIZABLE will fix the inconsistency you are seeing because it guarantees that your transactions behave as if they executed single-threadedly. Equivalently, they behave as if they executed instantly.

You will be getting deadlocks. If you are ok with a retry loop, you're done at this point.

If you want to invest more time, apply locking hints to force exclusive access to the relevant data:

UPDATE Gifts  -- U-locked anyway
SET GivenAway = 1
WHERE GiftID = (
   SELECT TOP 1 GiftID
   FROM Gifts WITH (UPDLOCK, HOLDLOCK) --this normally just S-locks.
   WHERE g2.GivenAway = 0
    AND (SELECT COUNT(*) FROM Gifts g2 WITH (UPDLOCK, HOLDLOCK) WHERE g2.GivenAway = 1) < 5
   ORDER BY g2.GiftValue DESC
)

You will now see reduced concurrency. That might be totally fine depending on your load.

The very nature of your problem makes achieving concurrency hard. If you require a solution for that we'd need to apply more invasive techniques.

You can simplify the UPDATE a bit:

WITH g AS (
   SELECT TOP 1 Gifts.*
   FROM Gifts
   WHERE g2.GivenAway = 0
    AND (SELECT COUNT(*) FROM Gifts g2 WITH (UPDLOCK, HOLDLOCK) WHERE g2.GivenAway = 1) < 5
   ORDER BY g2.GiftValue DESC
)
UPDATE g  -- U-locked anyway
SET GivenAway = 1

This gets rid of one unnecessary join.

usr
  • 168,620
  • 35
  • 240
  • 369
  • 2
    This is somewhat horrifying to learn. I've been writing single SQL statements for years, under the assumption that the end result of the statement was consistent with what i asked for. Also note that neither of the queries in exactly what the real system is (it would add a lot of complexity to the question). The concept i'm investigating, and having my entire world-view torn asunder over, is if a statement is consistent within itself. (I always knew that multiple statements within the same batch, or multiple batches, needs explicit transactions). – Ian Boyd Jan 30 '14 at 23:10
  • @IanBoyd so you hadn't considered that isolation level plays a role in that? – ypercubeᵀᴹ Jan 30 '14 at 23:18
  • @ypercube i had considered the role of the isolation level; but i only considered READ COMMITTED. – Ian Boyd Jan 31 '14 at 01:14
  • Well now, my entire world has come undone. Every UPDATE or DELETE statement i've ever written is fundamentally wrong. I assumed i could query for values, and have those values be correct for the duration of the statement. – Ian Boyd Jan 31 '14 at 01:17
  • 2
    @IanBoyd true, you now have to audit everything that is important. On the other hand I don't find database concurrency to be very troubling in practice. For some reason, most queries are not problematic. Maybe because they are cold and I can just make them SERIALIZABLE. Sometimes you can use SNAPSHOT isolation to get a consistent view of the database. Sometimes you can use a simple lock hierarchy (example: whenever a transaction wants to modify customer data (maybe add an order), they first U-lock the corresponding Customers row. This eliminates concurrency issues for individual customers.). – usr Jan 31 '14 at 08:33
  • WITH (UPDLOCK, HOLDLOCK) will go away as soon as SELECT TOP 1... completes. Then the UPDATE g, having the results of the Select will initiate its own locks. If 2 connections run this statement close simultaneously, while the 1st is at phase where awaits for update lock to be granted (because locks are first requested and then granted) at this time there is NO LOCK AT ALL meaning the second connection can read the very same record AGAIN. Shortly, it does not work 100%. – Stefanos Zilellis Apr 27 '21 at 10:08
3

Below is an example of an UPDATE statement that does increment a counter value atomically

-- Do this once for test setup
CREATE TABLE CountingNumbers (Value int PRIMARY KEY NOT NULL)
INSERT INTO CountingNumbers VALUES(1) 

-- Run this in parallel: start it in two tabs on SQL Server Management Studio
-- You will see each connection generating new numbers without duplicates and without timeouts
while (1=1)
BEGIN
  declare @nextNumber int
  -- Taking the Update lock is only relevant in case this statement is part of a larger transaction
  -- to prevent deadlock
  -- When executing without a transaction, the statement will itself be atomic
  UPDATE CountingNumbers WITH (UPDLOCK, ROWLOCK) SET @nextNumber=Value=Value+1
  print @nextNumber
END
Ries Vriend
  • 977
  • 7
  • 9
2

Select does not lock exclusively, even serializable does, but only for the time the select is executed! Once the select is over, the select lock is gone. Then, update locks take on as they now know what to lock as Select has return results. Meanwhile, anyone else can Select again!

The only sure way to safely read and lock a row is:

begin transaction

--lock what i need to read
update mytable set col1=col1 where mykey=@key

--now read what i need
select @d1=col1,@d2=col2 from mytable where mykey=@key

--now do here calculations checks whatever i need from the row i read to decide my update
if @d1<@d2 set @d1=@d2 else set @d1=@d2 * 2 --just an example calc

--now do the actual update on what i read and the logic
update mytable set col1=@d1,col2=@d2 where mykey=@key

commit transaction

This way any other connection running the same statement for the same data it will surely wait at the first (fake) update statement until the previous is done. This ensures that when lock is released only one connection will granted permission to lock request to 'update' and this one will surely read committed finalized data to make calculations and decide if and what to actually update at the second 'real' update.

In other words, when you need to select information to decide if/how to update, you need a begin/commit transaction block plus you need to start with a fake update of what you need to select - before you select it(update output will also do).

Ian Boyd
  • 246,734
  • 253
  • 869
  • 1,219