0

Consider a table A with a primary key, column1, column2.

Now let column1 be the primary key which is of type uniqueidentifier.

Now if you want to avoid duplicate inserts into the table using a stored procedure, which would be the better way?

Case #1:

IF NOT EXISTS (SELECT 1 FROM TABLE A WHERE column1 = @insertValue)
BEGIN
    INSERT INTO Table A
     ....
END

or case #2:

TRY
BEGIN
    INSERT INTO TABLE A
END
CATCH
BEGIN
    IF(ErrorNumber() = 2627)--Erro number of primary key violation
    BEGIN
        ---------- deal with the error
    END
END

I feel case #2 is better, because only 1 time SQL check operation is done, but in case#1 it is happening 2 times.

I understand when we create a primary key on a table, SQL checks for duplicates before insert, using this existing mechanism of SQL check for duplication isn't it good? instead of using IF NOT EXISTS

Please add your answer, please explain the answer in depth considering all the parameters

Guys please help me in understanding in terms of sql operation on a table. Objective here is to understand the internal mechanism of sql primary key violation check,

gotqn
  • 42,737
  • 46
  • 157
  • 243
Pavan
  • 9
  • 3
  • 3
    I would argue that #1 is "better" - since you're checking to **avoid** the error - instead of just letting it happen and having to respond to it. By checking first, you can avoid having to do all the work for the `INSERT` which is just going to fail - so even in terms of performance, I believe #1 would be preferable. – marc_s Jun 17 '20 at 11:10
  • 1
    And why is this a concern? You are using a synthetic ID so why would your application logic EVER attempt to insert a row with the same ID? Even if it did, your code must always handle errors correctly - a PK collision is just one of those errors. Still using 2008? You have more important things to worry about. – SMor Jun 17 '20 at 11:25
  • Using 2008, is our client limitation, and why we are attempt to insert a same ID is bcs we are getting a multiple request of same ID from other Components which we interact. – Pavan Jun 17 '20 at 11:38
  • 1
    @SMor Why I am asking this Question?? Answer goes like this,,Many of my colleagues including SQL experts are telling Case#2 is a hacky way of implementation, This doesn't comes under standard pattern all that,,So to get clarity by worldwide community,, I am interested – Pavan Jun 17 '20 at 11:54
  • The difference between #1 and #2 is that #2 is *a* correct solution (that is, it will result in correct observable behaviour), whereas #1 is *wrong*. #1 will result in primary key violations under heavy load, which will need to be handled anyway - it's highly unlikely with `uniqueidentifier` specifically though. See the duplicate link for other possible correct solutions. – GSerg Jun 17 '20 at 12:12
  • I definitely read an article years ago that recommended #2. I don't remember the reasoning. I'm pretty sure this has already been asked on here, but it will be difficult to find – Nick.Mc Jun 17 '20 at 12:18
  • 1
    I think you should rephrase your Question to not ask for which is the better way, but instead ask for the benefits of the one way compared to the other. The reason is that curators will see it and think you are asking an opinion based Question. See [What types of questions should I avoid asking?](https://stackoverflow.com/help/dont-ask) – Scratte Jun 17 '20 at 12:21
  • @Scratte Agree,,Thanks – Pavan Jun 17 '20 at 12:24
  • 1
    Of course changing the phrasing will not prevent your Question from being a duplicate. But at least you now have 11 more answers :) – Scratte Jun 17 '20 at 12:27
  • But there are no answer explanation by considering the number of sql operation on a table, Anyway I will delete my question. – Pavan Jun 17 '20 at 12:32

1 Answers1

1

You can use the following block when data modifications are applied:

SET XACT_ABORT, NOCOUNT ON;

BEGIN TRY

    BEGIN TRANSACTION;
    -- CODE BLOCK GOES HERE
    COMMIT TRANSACTION;

END TRY
BEGIN CATCH 

   IF @@TRANCOUNT > 0
   BEGIN
      ROLLBACK TRANSACTION
   END;

   -- GET ERRORS DETAILS OR THROW ERROR

END CATCH;

SET XACT_ABORT, NOCOUNT OFF;

In addition, if your primary key is UNIQUEIDENTIFER and you are generating the new values with NEWID() function you can be calm - the CATCH block will not be executed.

But, using such value as primary key is not good, because:

  • it takes more space than number types like (smallint, int, bigint)
  • in case NEWID() is used to generate the values, they will be inserted in random pages of your index which will lead to fragmentation; in case of IDENTITY column, the new rows are inserted in the last page; you can although fix this using NEWSEQUENTIALID
gotqn
  • 42,737
  • 46
  • 157
  • 243
  • I don't see how this answers the question. – GSerg Jun 17 '20 at 11:17
  • It gives improved variant of the second option. The first variant is risky, as during the `IF` execution the record may not exists, but during the `INSERT` such to be already committed and to raise an error again. Also, as I have written - if `NEWID` or `NEWSEQUENTIALID` is used there is no need to worry. – gotqn Jun 17 '20 at 11:19
  • I agree GUID takes more space, but that would come under completely different question, Here my question was focusing on sql check operation for duplicates, In Case#1 -- 2 times sql check on table is happening, – Pavan Jun 17 '20 at 11:31
  • @PavanBhat As I said for me is better to use case 2 as if you are using internal functions to generate the value the possibility of duplicates is minor. – gotqn Jun 17 '20 at 11:42
  • @gotqn Assume insert for same ID is not happening in concurrent way, Even then would you prefer to use case#2 ??, bcs I am thinking in numbers of sql operation wise on a table – Pavan Jun 17 '20 at 11:51
  • I will use the first case if I am importing a large result set. For example, image you are importing 1 million rows. The operation will consume some time to import 90% of the rows, and then it will fail because of primary key violation. In this case, there is no point to performing the `crud` as it will take time to rollback and no work will be done. In your case I doubt you will get performance benefits using one or the other, so I will prefer to second one as it is clear - we are leaving the data integrity check to the SQL server engine. – gotqn Jun 17 '20 at 12:08
  • @gotqn It really was helpful, But our client feels, leaving the data integrity check to the SQL server engine is not the right way..Anyway Thanks for your info. – Pavan Jun 17 '20 at 12:46