0

There is a post here that talks about this as being a bad idea and that instead of using a unique constraint I have to always INSERT only data not in the database. Now my table structure is as follows:

CREATE TABLE [dbo].[TEST](
    [ColA] [varchar](255) NULL,
    [ColB] [datetime] NULL,
    [ColC] [datetime] NULL,
    [ColD] [int] NOT NULL,
) ON [PRIMARY]

Only unique rows are expected so for me it makes sense in setting up a unique constraint as:

ALTER TABLE dbo.TEST
  ADD CONSTRAINT uniqueRows UNIQUE (ColA, ColB, ColC, ColD) WITH (IGNORE_DUP_KEY = ON)

When new information is being consumed, due to many reasons, there could be duplicate data being processed because my code does not maintain state and hence at least to me it makes sense to ignore any duplicate data.

However, in the linked post, @PhilipKelley in his answer, says that this is a bad idea and that one should make checks something along the lines of:

INSERT INTO X 
VALUES(Y,Z)    
WHERE Y  NOT IN (SELECT Y FROM X)

which in my case translates to:

INSERT INTO dbo.TEST 
VALUES(ValA,ValB,ValC,ValD) 
WHERE (Some complicated check)

Or perhaps make some fancy primary key. The response here hints that if I know this is indeed being used as a feature, I can go ahead and use the IGNORE_DUP_KEY option and things will be ok. What is the suggested path in my case?

Community
  • 1
  • 1
Legend
  • 113,822
  • 119
  • 272
  • 400
  • Are you trying to prevent duplicate data, or are you trying to prevent exceptions (and care about hiding exceptions more than preventing duplicates)? – Aaron Bertrand Jul 02 '12 at 00:54
  • @AaronBertrand: I am trying to prevent exceptions. What I am thinking is that to prevent duplicate data, we have to either (1) check for duplicates in our table using some `SELECT` statement and then ignore if row already exists, or (2) impose a unique constraint so that the database takes care of it. I'm just not sure which is better. – Legend Jul 02 '12 at 00:59
  • 1
    I wrote a little bit about this here: http://www.mssqltips.com/sqlservertip/2632/checking-for-potential-constraint-violations-before-entering-sql-server-try-and-catch-logic/ The gist is that for performance reasons you are better off checking for the violations and not inserting them vs. just letting the database raise exceptions (and swallow them with try/catch to prevent them from bubbling up further). But you should still have the constraints in place to protect your data at the lowest possible layer. – Aaron Bertrand Jul 02 '12 at 01:04

1 Answers1

1

Personally I think you should do three things:

  1. Absolutely have the unique constraint to protect the data at its most basic level.
  2. Check for potential violations before blindly inserting rows.
  3. Wrap TRY/CATCH around the eventual insert to protect users from exceptions where the check fails.

For single-row inserts, this can be as simple as:

BEGIN TRY
  INSERT dbo.TEST(ValA, ValB, ValC, ValD)
    SELECT @ValA, @ValB, @ValC, @ValD
    WHERE NOT EXISTS
    (SELECT 1 FROM dbo.TEST
      WHERE ValA = @ValA AND ValB = @ValB AND ValC = @ValC AND ValD = @ValD);
END TRY
BEGIN CATCH
  PRINT 'Move along, nothing to see here...';
END CATCH

It gets a little more complicated if any or all of these columns are nullable.

For multi-row inserts, you can handle this in a variety of ways. You can have the entire batch fail if there is a non-unique value (either in the batch alone or conflicting with the table), or you can allow just the successful rows into the table. Situation A:

  IF EXISTS (SELECT 1 FROM @SourceOfMultipleRows AS r
    WHERE EXISTS (SELECT 1 FROM dbo.Test AS t WHERE t.ValA = r.ValA AND ...))
  OR EXISTS (SELECT 1 FROM @SourceOfMultipleRows
    GROUP BY ValA, ValB, ValC, ValD HAVING COUNT(*) > 1)
  BEGIN
    PRINT 'Not proceeding at all.';
  END
  ELSE
  BEGIN
    BEGIN TRY
      INSERT dbo.TEST(ValA, ValB, ValC, ValD)
        SELECT ValA, ValB, ValC, ValD
        FROM @SourceOfMultipleRows AS r
        WHERE NOT EXISTS (SELECT 1 FROM dbo.Test AS t
          WHERE t.ValA = r.ValA AND ...)
        GROUP BY ValA, ValB, ValC, ValD;
    END TRY
    BEGIN CATCH
      PRINT 'Move along, nothing to see here...';
    END CATCH
  END

Scenario B, where you want to keep the good rows and ignore duplicates:

BEGIN TRY
  INSERT dbo.TEST(ValA, ValB, ValC, ValD)
    SELECT ValA, ValB, ValC, ValD
    FROM @SourceOfMultipleRows AS r
    WHERE NOT EXISTS (SELECT 1 FROM dbo.Test AS t
      WHERE t.ValA = r.ValA AND ...)
    GROUP BY ValA, ValB, ValC, ValD;
END TRY
BEGIN CATCH
  PRINT 'Move along...';
END CATCH
Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490