Making a TSQL Insert/Update work even when multithreaded

Question

I have a table that I use to keep track of customer usage. I track the number of hits the user does per day, so I have a table that looks like this:

CustID (uniqueidentifier, not null)

UseDate (smalldatetime, not null)

NumHits (smallint, not null)

I use this SQL in a stored proc to insert a row for today (if needed), or to increment the counter for today:

declare @today datetime
set @today = getdate()

/* Try to bump it by one if an entry for today exists */
if (
    select count(*) from CustomerUsage
    where CustID = @cust_guid and year(UseDate) = year(@today) and month(UseDate) = month(@today) and day(UseDate) = day(@today)
    ) = 0

    insert into CustomerUsage (CustID, UseDate, NumHits) values (@cust_guid, getdate(), 1)

else
    update CustomerUsage set NumHits = NumHits + 1 
    where CustID = @cust_guid and year(UseDate) = year(@today) and month(UseDate) = month(@today) and day(UseDate) = day(@today)

Is there a better way to do this?

Also, I want to move to an environment where multiple web servers can call the stored proc for the same customer. I think this code might be vulnerable to multithreading issues.

Thanks!

this has been covered extensively, see: http://stackoverflow.com/questions/108403/solutions-for-insert-or-update-on-sql-server and linked questions — Sam Saffron, Feb 21 '11 at 03:07

Thomas · Answer 1 · 2011-02-21T19:36:04.003

1

First, you can convert the current date and time into a Date only value by using:

DateAdd(d, DateDiff(d, 0, CURRENT_TIMESTAMP), 0)

One solution is to use two statements. However, to protect against a Phantom Read, you would need to wrap the statements in a transaction and set the isolation level to Serializable or Snapshot (if implemented). Obviously, that's going to hurt concurrency but would ensure consistency.

Declare @Today datetime
Set @Today = DateAdd(d, DateDiff(d, 0, CURRENT_TIMESTAMP), 0)

Set Transaction Isolation Level Serializable;
Set Xact_Abort On;

Begin Tran;

Update CustomerUsage 
Set NumHits = NumHits + 1 
Where CustID = @cust_guid 
    And UseDate >= @Today
    And UseDate < DateAdd(d, 1, @Today)

Insert CustomerUsage( CustId, UseDate, NumHits )
Select @CustId, CURRENT_TIMESTAMP, 1
From ( Select 1 As Value ) As Z
Where Not Exists    (
                    Select 1
                    From Customer_Usage
                    Where CustID = @cust_guid 
                        And UseDate >= @Today
                        And UseDate < DateAdd(d, 1, @Today)
                    )

Commit Tran;

Since each statement is itself a transaction, you will avoid problems with simultaneous calls.If you are using SQL Server 2008, then you can make use of the Date data type and the Merge statement to achieve the same:

Declare @Today date
Set @Today = Cast( CURRENT_TIMESTAMP As date )

Merge CustomerUsage As target
Using ( 
        Select CustId, UseDate
        From CustomerUsage
        Where CustID = @cust_guid 
            And UseDate >= @Today
            And UseDate < DateAdd(d, 1, @Today)
        Union All
        Select @cust_guid, CURRENT_TIMESTAMP 
        From ( Select 1 As Value ) As Z
        Where Not Exists    (
                            Select 1
                            From CustomerUsage
                            Where CustID = @cust_guid 
                                And UseDate >= @Today
                                And UseDate < DateAdd(d, 1, @Today)
                            )
        ) As source
    On source.CustID = target.CustID
        And source.UseDate = target.UseDate
When Matched Then 
    Update Set NumHits = NumHits + 1
When Not Matched Then 
    Insert ( CustId, UseDate, NumHits )
    Values( source.CustId, source.UseDate, 1 )

Final Addition

While I realize that the answer has been chosen, it occurs to me that there is a better solution. There is no need to update a counter of hits per day. Just do an insert that logs that a hit occurred (i.e. only an insert without tracking NumHits) and on the reporting side, rollup the "hits per day".

edited Feb 21 '11 at 19:36

answered Feb 20 '11 at 21:02

Thomas

63,911
12
95
141

+1 for using `not exists` on the insert statement but I think you can miss a `NumHits + 1` in your first solution. Do the insert before the update with `NumHits = 0` instead and let the update always increment by one. – Mikael Eriksson Feb 20 '11 at 21:58
I tried this, but it didn't insert a row when there was no match. It seemed to update the field when it did match. – Greg Feb 20 '11 at 22:37
@Mikael Eriksson - RE: first solution, perhaps I corrected it after you viewed it? Should be right now. If you do the Insert first, then the Update will always fire meaning you run the risk of updating the row twice. If I wanted to go further, I could add a `If @@RowCount = 0` before I run the Insert. – Thomas Feb 20 '11 at 23:33
@Gregor S - Are you referring to the first solution or the use of the Merge statement? – Thomas Feb 20 '11 at 23:36
@Gregor S - I believe you are referencing the `Merge` statement which I corrected. In short, you need the source to return all rows you might want to insert into your target table. Thus, I needed to add a Union All to accommodate the scenario where the main query was empty. – Thomas Feb 20 '11 at 23:48
If two threads runs your non merge version at the same time, both threads will try to update a non existing row and of course not find one. Then they will both try to add a new row. Only one will insert a new row. The result will be that `NumHits` will be 1 instead of the desired 2. – Mikael Eriksson Feb 21 '11 at 08:50
@Mikael Eriksson - What you are discussing is a phantom read. The only way to protect against that is to use a transaction with the isolation level set to Serializable or Snapshot. I've updated my solution to account for this scenario. – Thomas Feb 21 '11 at 17:46
@Thomas: Your update fixed the problem. I don't know about phantom reads. Never heard about it. Changing places of the update and insert seams to me like an easier option. If you first do the insert and then the update the code does not even have to be in the same stored proc/server roundtrip, let alone the same transaction. – Mikael Eriksson Feb 21 '11 at 18:19
@Mikael Eriksson - Suppose I swap the Insert and Update and I have two threads: T1 and T2 and isolation level = Read Committed. T1 executes and does an Insert; NumHits = 1. T2 executes simultaneously and does its Insert; NumHits = 1 and we have a second row. T1 now executes its Update which sets both rows NumHits = 2. T2 now executes and sets both rows NumHits = 3. – Thomas Feb 21 '11 at 19:24
@Mikael Eriksson - Actually, it occurs to me that there is no need for the update statement. Why keep track of the hits for the exact datetime when he's only checking by day? Just do inserts and on the reporting side track how many hits per day. – Thomas Feb 21 '11 at 19:30
No, you still need the where not exists clause on the insert statement. That will prevent thread 2 from inserting. As far as I know you are not allowed to do simultaneous insert on a table. And the insert should not set NumHits = 1 it should set NumHits = 0. – Mikael Eriksson Feb 21 '11 at 19:33
@Mikael Eriksson - IMO, it would be better to just do the inserts and count the results per day afterwards. I.e., skip the updates. – Thomas Feb 21 '11 at 19:37
@Mikael Eriksson - You would rip out the Exists clause from the Insert statement and simply let the table insert multiple rows on the same day. To determine NumHits on a given day, you would simply rollup by CustId and Day. It would be significantly faster and would scale better. – Thomas Feb 21 '11 at 19:39
@Mikael Eriksson - RE: two threads example. If both threads do their insert simultaneously, then both their Not Exists checks both return true at the same time. – Thomas Feb 21 '11 at 19:41
@Thomas: That is a great idea and probably what I would have done as well. OP has decided that he wants to know the hits per day. In the future he may very well want to know hits per hour or something like that. Logging all hits is a much better and flexible solution. BTW, your Cthulhu image brings back memories from some game I used to play like 25 years ago :). – Mikael Eriksson Feb 21 '11 at 19:44
@Thomas, There is a lock preventing inserts before the not exist clause is evaluated. – Mikael Eriksson Feb 21 '11 at 19:45
@Mikael Eriksson - The Inserts only create shared locks under Read Committed. Thus, both Not Exists clauses would not be blocked, both would return false and barring a unique constraint, both inserts would occur. Read Committed do no prevent a phantom read. A phantom read is where an insert occurs after you queried your data but before your transaction committed. Only if you set the isolation level to Serializable (or Snapshot) could you prevent the second Not Exists statement from evaluating until the first Insert had completed. – Thomas Feb 21 '11 at 19:52
@Thomas. You are right and my assumptions are wrong. I have tested this and the insert may end up having a unique constraint error. Switching to `serializable` took care of that but instead I got a dead lock in one thread. Not nearly as often as the unique constraint error. The only solution I see now if you want to do an "insert if not exist" is to embed the insert in a try..catch. Not something I can recommend to anybody right now because I know to little about it. There might be side effects I don't know about. – Mikael Eriksson Feb 22 '11 at 11:04
A bit more testing shows that the winner is `merge...when not matched` and isolation level `serializable`. With `read committed` I still get unique constraint error using merge. No deadlocks with merge. – Mikael Eriksson Feb 22 '11 at 12:15
Updated my answer. `with (updlock, serializable)` takes care of unique constraint error and deadlocks. Thanks Thomas, I really learned something here. – Mikael Eriksson Feb 22 '11 at 12:32
@Mikael Eriksson - Right. Regardless of approach, the right solution is to simply do inserts with no updates and do the counting afterwards. In that way, there are no deadlock errors (or unique constraint errors since having two entries on the same day and time is acceptable.) Still, a UC would be better but SQL 2005 or prior's time resolution isn't granular enough. With SQL 2008 however, you can go down to nanoseconds that would make a UC feasible. – Thomas Feb 22 '11 at 17:49

score 0 · Answer 2 · answered Feb 20 '11 at 20:43

You have the chance of the following scenario happening:

One connection executes the if-statement expression, finding out that the current count is 0
Another connection executes the if-statement expression, finding out that the current count is 0
The first connection manages to insert a new row
The second connection crashes with an exception

Also, it looks strange that you store the full date and time and then use functions to check if you have an existing row for today. If you need to record the date for which the row applies, and the time the row was added, I would probably separate that into two columns so that you could use indexes on the first. Those functions are not sargable, which means the query optimizer won't be able to use indexes to find the matching row.

To solve the concurrency issue I would either:

Issue a lock on the table before doing anything
or, just execute the insert as though it was the first row, every time, and handle the exception on duplicate key (you have a key that includes customer id + date, right?) by updating instead. This would make each statement complete by itself.

The second point would look like this in a normal programming language, you have to rewrite for SQL:

try
{
    Insert ...
}
catch (DuplicateKeyException)
{
    Update ...
}

Mikael Eriksson · Accepted Answer · 2011-02-22T12:30:11.470

You can do like this.

declare @Today smalldatetime = dateadd(d, datediff(d, 0, getdate()), 0)
declare @Tomorrow smalldatetime = dateadd(d, 1, @Today)

insert into CustomerUsage(CustId, UseDate, NumHits )
select Data.CustID, Data.UseDate, Data.NumHits
from (select
        @cust_guid as CustID,
        getdate() as UseDate,
        0 as NumHits) as Data
where not exists (select *
                  from CustomerUsage with (updlock, serializable)
                  where
                    CustID = @cust_guid and
                    UseDate >= @Today and
                    UseDate < @Tomorrow)        


update CustomerUsage 
set NumHits = NumHits + 1 
where
  CustID = @cust_guid and
  UseDate >= @Today and
  UseDate < @Tomorrow

First insert a row with NumHits = 0. The insert checks if there already exists a row for that CustID on that UseDate and only inserts if necessary.

After the insert always increment NumHits by 1.

This is basically the same idea as Thomas non merge answer but I do the insert before the update.

Edit 1 Added Table Hints to the insert statements where not exists part. serializable is needed to prevent primary key constraint violation and updlock is needed to avoid deadlocks.

Making a TSQL Insert/Update work even when multithreaded

3 Answers3