3

I've been searching around for a question similar to this for awhile now, and I haven't found anything, so if this has been asked before, this will at least serve as a good pointer for those ignorant of the proper nomenclature.

I want to INSERT INTO a table if a row doesn't already exist, based on a unique key. It it does exist, then I want to get the primary key Id of that row.

Imagine a table that holds email addresses:

EmailAddressId(PK) | EmailAddress(UK)

I want to INSERT into that table a new Email Address, but there is a unique constraint on EmailAddress. Thus, if the new Email Address is the same as an existing, the INSERT will fail. In that case, I want to select the existing EmailAddressId from the database for the EmailAddress.

I want to do this in the fewest number of operations, assuming that collisions will be a rare case.

Thus, I setup a TRY...CATCH block within a Stored Procedure as follows:

ALTER PROCEDURE [dbo].[EmailAddressWrite]
    @EmailAddress nvarchar[256]
BEGIN
SET NOCOUNT ON;
    BEGIN TRANSACTION

    DECLARE @EmailAddressId INT

    BEGIN TRY
        INSERT INTO EmailAddress VALUES (@EmailAddress)
        SET @EmailAddressId = (SELECT SCOPE_IDENTITY())
    END TRY
    BEGIN CATCH
        SET @EmailAddressId = (SELECT EmailAddressId FROM EmailAddress WHERE EmailAddress = @EmailAddress)
    END CATCH

    --Do some more stuff with the Id now.

    COMMIT TRANSACTION

    RETURN @EmailAddressId
END

The code above functions, and produces the required result, but the Internet makes me think that using TRY...CATCH in this fashion might be slow...thus I'm unsure if this is an optimal solution.

I've only found one other solution which is to SELECT first, and INSERT second. This would result in 2 operations almost all of the time, as I am anticipating very few duplicate email addresses (at least for a month or more).

  1. Is this the optimal solution to achieve 1 operation on INSERT and 2 operations on INSERT fail?
  2. What other solutions can achieve 1 operation on INSERT and 2 operations on INSERT fail?

If I've misused any terminology, please correct it.

crush
  • 16,713
  • 9
  • 59
  • 100

1 Answers1

5
DECLARE @id INT

DECLARE @newid TABLE
        (
        emailAddressId INT NOT NULL PRIMARY KEY
        )

;
WITH    t AS
        (
        SELECT  *
        FROM    emailAddress WITH (ROWLOCK, HOLDLOCK)
        WHERE   emailAddress = @emailAddress
        )
MERGE
INTO    t
USING   (
        SELECT  @emailAddress
        ) s (emailAddress)
ON      1 = 1
WHEN NOT MATCHED BY TARGET THEN
INSERT  (emailAddress)
VALUES  (emailAddress)
WHEN MATCHED THEN
UPDATE
SET     @id = 1
OUTPUT  INSERTED.emailAddressId
INTO    @newid
;
Quassnoi
  • 413,100
  • 91
  • 616
  • 614
  • @usr: right, I'm just used to lock the record as well this way. – Quassnoi Jun 29 '15 at 22:26
  • Sorry, I'm a bit confused. @usr are you suggesting that a local variable may be updated instead of locking? Or do both need to be done? – crush Jun 29 '15 at 22:27
  • @crush: if you do an actual update, it will result in a page write (which is not necessary) and the record being locked until the end of the transaction (which might be useful). Not sure it you need it or not. – Quassnoi Jun 29 '15 at 22:28
  • I usually apply the lock hints in the TARGET clause. That works well when the MERGE targets exactly 0 or 1 rows. – usr Jun 29 '15 at 22:29
  • @usr: sure, that's even better. – Quassnoi Jun 29 '15 at 22:30
  • Am I reading this right? This will lock the row with an existing email from being updated, and lock the table from being inserted. Then, it will attempt to insert the email into the table if it doesn't exist. If it does exist, it will update the `@id` variable with 1? How will @id get the EmailAddressId of the existing row? Should the final statement be `OUTPUT INSERTED.id INTO @id`? – crush Jun 29 '15 at 23:11
  • @crush: this will lock the record with the existing email till the end of transaction, if any. It won't (necessarily) lock the other records, though it still might. Yes, it will update `@id` with 1. `@id` won't get the `EmailAddressId` of the existing row, it will get 1. The reason `@id` is there is that `MERGE` has to have the `UPDATE` clause for the `OUTPUT` to return something in case of a match (as opposed to insert). If you want to use `INTO` in the `OUTPUT` clause, its destination should be a table or a table variable, not a scalar. – Quassnoi Jun 29 '15 at 23:16
  • I need the `EmailAddressId` of the existing row for later in the stored procedure. Can I get this with `OUTPUT`? – crush Jun 29 '15 at 23:21
  • I'm having trouble getting the id out of the inserted table. It tells me the column is not defined - I'm not sure what columns are defined, and am having trouble finding out. I tried `SELECT * FROM inserted` after the merge statement, but it thinks inserted doesn't exist at that point. I'm using [MSDN](https://msdn.microsoft.com/en-us/library/ms191300(v=sql.110).aspx) article as reference. – crush Jun 30 '15 at 14:16
  • @crush: could you please post your table definition? – Quassnoi Jun 30 '15 at 14:32
  • See this [gist](https://gist.github.com/crush83/c767f6346d7e6c9d4c0c). Updated (was missing Email Address unique key constraint for some reason). – crush Jun 30 '15 at 14:49
  • Seems I needed to just return EmailAddressId from my `WITH ... SELECT`. `SELECT EmailAddress, EmailAddressId` so that it had that column. Your example was using the `*` selector, so you didn't run into that problem. Completely my fault. Thanks for the help. – crush Jun 30 '15 at 14:54