7

I'm trying to implement your basic UPSERT functionality, but with a twist: sometimes I don't want to actually update an existing row.

Essentially I'm trying to synchronize some data between different repositories, and an Upsert function seemed like the way to go. So based largely on Sam Saffron's answer to this question, as well as some other research and reading, I came up with this stored procedure:

(note: I'm using MS SQL Server 2005, so the MERGE statement isn't an option)

CREATE PROCEDURE [dbo].[usp_UpsertItem] 
    -- Add the parameters for the stored procedure here
    @pContentID varchar(30) = null, 
    @pTitle varchar(255) = null,
    @pTeaser varchar(255) = null 
AS
BEGIN
    -- SET NOCOUNT ON added to prevent extra result sets from
    -- interfering with SELECT statements.
    SET NOCOUNT ON;

    BEGIN TRANSACTION

        UPDATE dbo.Item WITH (SERIALIZABLE)
        SET Title = @pTitle,
            Teaser = @pTeaser
        WHERE ContentID = @pContentID

        IF @@rowcount = 0
            INSERT INTO dbo.Item (ContentID, Title, Teaser)
            VALUES (@pContentID, @pTitle, @pTeaser)

    COMMIT TRANSACTION
END

I'm comfortable with this for a basic Upsert, but I'd like to make the actual update conditional on the value of another column. Think of it as "locking" a row so that no further updates may be made by the Upsert procedure. I could change the UPDATE statement like so:

UPDATE dbo.Item WITH (SERIALIZABLE)
SET Title = @pTitle,
    Teaser = @pTeaser
WHERE ContentID = @pContentID
AND RowLocked = false

But then the subsequent Insert would fail with a unique constraint violation (for the ContentID field) when it tries to insert a row that already exists but wasn't updated because it was "locked".

So does this mean that I no longer have a classic Upsert, i.e. that I'll have to select the row every time to determine whether it can be updated or inserted? I'm betting that's the case, so I guess what I'm really asking for is help getting the transaction isolation level correct so that the procedure will execute safely.

Community
  • 1
  • 1
Matt
  • 5,052
  • 4
  • 36
  • 54
  • What is RowLocked in (AND RowLocked = false)? Is it a column in your table? – A-K Jul 10 '09 at 02:26
  • @AlexKuznetsov - Yes, RowLocked is supposed to be a table column; in reality, there are a couple columns that dictate whether or not a row should be "locked" (i.e. not updated by this procedure) but I simplified my SQL to try to make my question more clear; got a little sloppy with the syntax though - it should, of course, be "AND RowLocked = 0" and I should have mentioned it's a bit column. – Matt Jul 10 '09 at 15:34

5 Answers5

2

I slapped together the following script to proof this trick I used in years past. If you use it, you'll need to modify it to suit your purposes. Comments follow:

/*
CREATE TABLE Item
 (
   Title      varchar(255)  not null
  ,Teaser     varchar(255)  not null
  ,ContentId  varchar(30)  not null
  ,RowLocked  bit  not null
)


UPDATE item
 set RowLocked = 1
 where ContentId = 'Test01'

*/


DECLARE
  @Check varchar(30)
 ,@pContentID varchar(30)
 ,@pTitle varchar(255)
 ,@pTeaser varchar(255)

set @pContentID = 'Test01'
set @pTitle     = 'TestingTitle'
set @pTeaser    = 'TestingTeasier'

set @check = null

UPDATE dbo.Item
 set
   @Check = ContentId
  ,Title  = @pTitle
  ,Teaser = @pTeaser
 where ContentID = @pContentID
  and RowLocked = 0

print isnull(@check, '<check is null>')

IF @Check is null
    INSERT dbo.Item (ContentID, Title, Teaser, RowLocked)
     values (@pContentID, @pTitle, @pTeaser, 0)

select * from Item

The trick here is that you can set values in local variables within an Update statement. Above, the "flag" value gets set only if the update works (that is, the update criteria are met); otherwise, it won't get changed (here, left at null), you can check for that, and process accordingly.

As for the transaction and making it serializable, I'd like to know more about what must be encapsulated within the transaction before suggesting how to proceed.

-- Addenda, follow-up from second comment below -----------

Mr. Saffron's ideas are a thorough and solid way of implementing this routine since your primary keys are defined outside and passed into the database (i.e. you're not using identity columns--fine by me, they are often overused).

I did some more testing (added a primary key constraint on column ContentId, wrap the UPDATE and INSERT in a transaction, add the serializable hint to the update) and yes, that should do everything you want it to. The failed update slaps a range lock on that part of the index, and that will block any simultaneous attempts to insert that new value in the column. Of course, if N requests are submitted simultaneously, the "first" will create the row, and it will be immediately updated by the second, third, etc.--unless you set the "lock" somewhere along the line. Good trick!

(Note that without the index on the key column, you'd lock the entire table. Also, the range lock may lock the rows on "either side" of the new value--or maybe they won't, I didn't test that one out. Shouldn't matter, since the duration of the operation should [?] be in single-digit milliseconds.)

Philip Kelley
  • 39,426
  • 11
  • 57
  • 92
  • To mention, in the original sample code you update table Item, but inserted into table MailItem; aren't upserts supposed to be applied against the same table? – Philip Kelley Jul 09 '09 at 23:06
  • The mismatched table names are a typo (now corrected). I knew you could set local variable with a SELECT, but I'd never tried it with an UPDATE, so that might just do the trick. Regarding the serializable transaction, my (admittedly imperfect) understanding is that if you don't use some sort of lock you can get unique key constraint violations and that "UPDATE with (serializable)" appropriately does this w/out deadlocks. I'm working from the example in the linked question (above) and still reading/trying to make sure I understand exactly what that does. – Matt Jul 09 '09 at 23:41
  • Updated my answer with feedback on the above comment. – Philip Kelley Jul 10 '09 at 14:11
  • @Philip, thanks for the update and the plain language about hints and locks. However, when I tested I realized your approach using "IF @Check is null" is functionaly equivalent to the original "IF @@rowcount = 0" because @Check will only have a value if the row is updated. When the row exists but isn't updated, @check is null and I get a unique constraint violation from the subsequent insert. So your code works as described, it just doesn't solve my problem ;) Still, it helped, so +1. – Matt Jul 10 '09 at 17:12
1
BEGIN TRANSACTION

IF EXISTS(SELECT 1 FROM dbo.Item WHERE ContentID = @pContentID)
     UPDATE dbo.Item WITH (SERIALIZABLE)
     SET Title = @pTitle, Teaser = @pTeaser
     WHERE ContentID = @pContentID
     AND RowLocked = false
ELSE
     INSERT INTO dbo.Item
          (ContentID, Title, Teaser)
     VALUES
          (@pContentID, @pTitle, @pTeaser)

COMMIT TRANSACTION
MyItchyChin
  • 13,733
  • 1
  • 24
  • 44
  • What is RowLocked in (AND RowLocked = false)? Is it a column in your table? – A-K Jul 10 '09 at 02:26
  • 2
    I stress tested what I understood as your approach, and it does not hold up in high concurrency. – A-K Jul 10 '09 at 02:38
-1

You could switch the order of the update/insert around. So you do the insert within a try/catch and if you get a constraint violation then do the update. It feels a little dirty though.

Steve Weet
  • 28,126
  • 11
  • 70
  • 86
  • I always thought that you weren't supposed to rely on error handlers for "normal" processing, i.e. if I know that a typical use case will raise an exception, then I should check for that condition and handle it before it raises an exception. So I agree, it does feel a little dirty ;) If I can get the isolation level right (I'm still reading) then the logic is fairly straightforward - but I lose the original advantage of the upsert (i.e. no extra DB read). – Matt Jul 09 '09 at 23:21
-2

CREATE PROCEDURE [dbo].[usp_UpsertItem] -- Add the parameters for the stored procedure here @pContentID varchar(30) = null, @pTitle varchar(255) = null, @pTeaser varchar(255) = null AS BEGIN -- SET NOCOUNT ON added to prevent extra result sets from -- interfering with SELECT statements. SET NOCOUNT ON;

BEGIN TRANSACTION
    IF EXISTS (SELECT 1 FROM dbo.Item WHERE ContentID = @pContentID
             AND RowLocked = false)
       UPDATE dbo.Item 
       SET Title = @pTitle, Teaser = @pTeaser
       WHERE ContentID = @pContentID
             AND RowLocked = false
    ELSE IF NOT EXISTS (SELECT 1 FROM dbo.Item WHERE ContentID = @pContentID)
            INSERT INTO dbo.Item (ContentID, Title, Teaser)
            VALUES (@pContentID, @pTitle, @pTeaser)

COMMIT TRANSACTION

END

JNappi
  • 1,495
  • 1
  • 14
  • 24
  • 3
    Horrible horrible code! Not only are you running a query twice for the same conditional operation but you're using "else if not exists" when a simple "else" would do. See CptSkippy's answer for a better example. – Chris Jul 10 '09 at 01:45
  • I agree the other solution is cleaner, but horrible, horrible...I was pointing in the right direction, no? – JNappi Jul 10 '09 at 18:22
-2

I'd drop the transaction.

Plus the @@rowcount probably would work, but using global variables as a conditional check will lead to bugs.

Just do an Exists() check. You have to make a pass through the table anyhow, so speed is not the issue.

No need for the transaction as far as I can see.

Andrew
  • 326
  • 1
  • 7
  • 1
    The update/insert pattern with rowcount is only safe because he's using serializable to lock until the insert. Otherwise, the insert could conflict with a concurrent attempt to update that also matches no rows and then double insert causes duplicate rows or duplicate key error if you have unique keys. – Jeremiah Gowdy Mar 20 '14 at 22:29