Duplicate check using SELECT or rely on unique index?

Question

Using SQL Server 2012. I want to insert unique strings into a table. I always want to return the row ID of the unique string. Now, this can be accomplished in two ways.

Which solution is the best?

This is the table in question:

CREATE TABLE [dbo].[Comment](
    [CommentID] [int] IDENTITY(1,1) NOT NULL,
    [Comment] [nvarchar](256) NOT NULL

    CONSTRAINT [PK_Comment] PRIMARY KEY CLUSTERED([CommentID] ASC)
)

CREATE UNIQUE NONCLUSTERED INDEX [IX_Comment_Comment] ON [dbo].[Comment]
(
    [Comment] ASC
)

Solution 1:

SELECT first to check if the string exists. If it does, return its ID. Otherwise, INSERT a new row and return the newly created ID.

CREATE PROCEDURE [dbo].[add_comment]
    @Comment [nvarchar](256)
AS
BEGIN
    SET NOCOUNT ON

    DECLARE @CommentID [int]
    DECLARE @TransactionCount [int]

    BEGIN TRY
        SET @TransactionCount = @@TRANCOUNT

        IF @TransactionCount = 0
            BEGIN TRANSACTION

        SELECT @CommentID = [CommentID] FROM [dbo].[Comment] WHERE [Comment] = @Comment

        IF @@ROWCOUNT = 0
        BEGIN
            INSERT INTO [dbo].[Comment]([Comment]) VALUES (@Comment)

            SET @CommentID = SCOPE_IDENTITY()
        END

        IF @TransactionCount = 0
            COMMIT TRANSACTION
    END TRY
    BEGIN CATCH
        IF XACT_STATE() <> 0 AND @TransactionCount = 0 
            ROLLBACK TRANSACTION

        ; THROW
    END CATCH

    RETURN @CommentID
END

Solution 2:

INSERT first. If the insert violates the UNIQUE INDEX, a SELECT is issued.

CREATE PROCEDURE [dbo].[add_comment2]
    @Comment [nvarchar](256)
AS
BEGIN
    SET NOCOUNT ON

    DECLARE @CommentID [int]

    BEGIN TRY
        INSERT INTO [dbo].[Comment]([Comment]) VALUES (@Comment)
        SET @CommentID = SCOPE_IDENTITY()
    END TRY
    BEGIN CATCH
        IF @@ERROR = 2601 -- Duplicate
            SELECT @CommentID = [CommentID] FROM [dbo].[Comment] WHERE [Comment] = @Comment
        ELSE
            THROW
    END CATCH

    RETURN @CommentID
END
GO

Solution 3:

Ideas? :)

http://stackoverflow.com/questions/5331461/is-checking-for-primary-key-value-before-insert-faster-than-using-try-catch — BlackICE, Oct 30 '13 at 15:10
http://stackoverflow.com/questions/7917695/sql-server-return-value-after-insert — npe, Oct 30 '13 at 15:14
You should be sending the value back using an output parameter, not return. Return should be used for error and status codes, not data. — Aaron Bertrand, Oct 30 '13 at 15:32
@AaronBertrand Is that always true? Where can I read about this convention? In this case, the concept of a return value was there for compatibility with MySQL. — l33t, Oct 31 '13 at 09:11
@l33t https://sqlblog.org/blogs/aaron_bertrand/archive/2009/10/09/bad-habits-to-kick-using-select-or-return-instead-of-output.aspx — Aaron Bertrand, Oct 31 '13 at 14:49

Aaron Bertrand · Accepted Answer · 2021-12-28T02:56:19.287

5

In my testing I have found that it is much more efficient to check for the violation first instead of letting SQL Server try and fail, especially when the failure rate is expected to be significant (and at best they perform about the same overall, when the failure rate is low). Details here and here.

In addition to performance, another reason to not rely on the constraint to raise an error is that tomorrow someone could change or drop it.

edited Dec 28 '21 at 02:56

answered Oct 30 '13 at 15:24

Aaron Bertrand

272,866
37
466
490

Finally a comprehensive test on this subject! – l33t Oct 30 '13 at 15:29

score 1 · Answer 2 · answered Oct 30 '13 at 15:10

Personally I do it like this

INSERT INTO [dbo].[Comment]([Comment])
SELECT T.Comment
FROM (SELECT @Comment) T (Comment)
LEFT JOIN dbo.Comment C ON C.Comment = T.Comment
WHERE C.Comment IS NULL

SELECT @CommentID = CommentID
FROM dbo.Comment C
WHERE C.Comment = @Comment

score 1 · Answer 3 · answered Oct 30 '13 at 15:13

1

You should also encapsulate it all within a transaction. I like the idea of a conditional insert.

answered Oct 30 '13 at 15:13

Adam Miller

767
1
9
22

Why a transaction (solution 2 I suppose)? – l33t Oct 30 '13 at 15:22
A transaction will keep it atomic, essentially, keeping the two statements in the same 'session'. It's good practice when performing any non-simple action on a database, where one part depends on the rest happening successfully. – Adam Miller Oct 30 '13 at 15:34

gvee · Answer 4 · 2013-10-30T15:37:33.777

1

I think this is the simplest option:

SET NOCOUNT ON;

BEGIN TRAN
    /* Insert if doesn't already exist */
    INSERT INTO dbo.comment (comment)
    SELECT @comment
    WHERE  NOT EXISTS (
             SELECT comment
             FROM   dbo.comment
             WHERE  comment = @comment
           );

    /* Return comment id */
    SELECT commentid
    FROM   dbo.comment
    WHERE  comment = @comment;
END TRAN

Alternative to the insert part if you prefer the newer MERGE syntax (please read this article by Aaron Bertrand first):

...
; WITH source AS (
  SELECT @comment As comment
)
MERGE INTO dbo.comment As destination
  USING source
    ON source.comment = destination.comment 
WHEN NOT MATCHED THEN
  INSERT (comment)
    VALUES (source.comment)
;
...

edited Oct 30 '13 at 15:37

answered Oct 30 '13 at 15:22

gvee

16,732
35
50

3

Oh no. MERGE absolutely does not buy you anything performance-wise, and it introduces other potential problems. Not only have I stopped recommending it to anyone, I've actually changed customers' code back to old-style upserts. Read: http://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/ – Aaron Bertrand Oct 30 '13 at 15:30
@AaronBertrand thank you for the link - I have just given it a read and have to admit I had naively overlooked many of the points you raised. Well presented and clear - I will update my answer to recommend reading this article. – gvee Oct 30 '13 at 15:36

NoChance · Answer 5 · 2013-10-30T16:03:15.070

-2

I would go with insert first then grab the result. The db will do the check internally. You will need to write the logic for duplicate key anyway so use it. This way you make 1 call to the database not 2. Also, who knows what would happen between the 2 calls?

Edit: Based on the run results from Aaron Bertrand - This answer is not always valid.

edited Oct 30 '13 at 16:03

answered Oct 30 '13 at 15:09

NoChance

5,632
4
31
45

how are either of those 2 calls to the database, it's all in a stored proc? – BlackICE Oct 30 '13 at 15:12
I would consider Select and Insert to be 2 actions against the database each may require row data and index fetches. – NoChance Oct 30 '13 at 15:15
that would be why there is a transaction in there. – BlackICE Oct 30 '13 at 15:18
@BlackICE, a transaction will not prevent the database from changing between the 2 actions. – NoChance Oct 30 '13 at 15:45
umm, that's exactly what a transaction is for. – BlackICE Oct 30 '13 at 17:54
My understanding is that a transaction will provide a consistent view of data within the instance of code that is executing but will not lock the database to prevent other instances(transactions) from modifying the database. That is why there are different types of locks that could be explicitly requested by the developer. – NoChance Oct 31 '13 at 00:39

Duplicate check using SELECT or rely on unique index?

5 Answers5