1

I have a a table called Process that has columns something like this

Hold, GId, Source, Type, Operation

and Temp_Table that looks something like this

Hold, GId, Source, Type

Hold_Table looks something like this

Hold, GId, Source, Type

I compare Temp_Table and Hold_Table and insert the record into Process table with "Add" Operation if the Source is not found in Hold_Inv table but I'm just wondering how can I prevent it from inserting duplicated value into Process table.

INSERT INTO Process (Hold, GId, Source, Type, Operation)
    SELECT Hold, GID, Source, Type, 'ADD' Operation 
    FROM Temp_Table
    WHERE Source IN (
        SELECT Source  
        FROM (
            SELECT Source 
            FROM Temp_Table Thi
            WHERE NOT EXISTS (
                SELECT 1
                FROM Hold_Table Hi 
                WHERE Thi.Source = Hi.Source
            )
            AND Thi.Source <> 'NOT FOUND'
            AND LEN(TRIM(ISNULL(Thi.Source, ''))) > 0
        ) T1
    )  

I run the query twice then the record from Temp_Table are inserted twice into the Process table.

It should be inserting twice if one column value is different but if they all the same then it shouldn't be inserting twice anymore.

Any help or suggestion would be really appreciated.

Dale K
  • 25,246
  • 15
  • 42
  • 71
aasenomad
  • 405
  • 2
  • 11
  • Use a NOT EXIST subquery, or create a constraint that wont let you insert duplicate values (if there is a specific column that you use to say if it exists or not). – Brad Jun 08 '23 at 19:40
  • Hi Brad. Could you pls show me how to use a sub NOT Exist subquery?. I only want to insert again if at least one column value is different. if all the columns value are the same then don't insert anymore in Process table. – aasenomad Jun 08 '23 at 19:42
  • Here is an example, if you need multiple columns you can just add those to the query in the NOT EXIST: https://stackoverflow.com/a/5288494/1459036 – Brad Jun 08 '23 at 20:00
  • 1
    Create a unique constraint with `IGNORE_DUP_KEY` – Stu Jun 08 '23 at 20:04
  • The best way of doing this is context dependent. You can use `IGNORE_DUP_KEY` as @Stu suggests but if this is defined on a clustered index this can be slow [if there is a high probability of duplicates](https://sqlperformance.com/2019/04/sql-performance/ignore_dup_key-slower-clustered-indexes) - and if defined on a nonclustered index it will take serializable range locks that are held until the end of the transaction (which may be a problem depending on context) - if you do `NOT EXISTS` you would also need range locks or retry on error though to be thread safe. – Martin Smith Jun 09 '23 at 01:06
  • So possibly the "best" way might be `IGNORE_DUP_KEY` on a clustered/unique index and also a `NOT EXISTS` without any locking hints. The `NOT EXISTS` then makes an initial non thread safe attempt at removing duplicates without taking serializable locks to actually prevent a concurrent insert adding one and if the odd one does still get through this can be removed by the storage engine checks with most of the cost of this removed. – Martin Smith Jun 09 '23 at 01:24

1 Answers1

1

First of, your query can be simplified from :

INSERT INTO Process (Hold, GId, Source, Type, Operation)
SELECT Hold, GID, [Source], [Type], 'ADD' Operation 
FROM Temp_Table
WHERE Source IN (
        SELECT Source  
        FROM (
            SELECT Source 
            FROM Temp_Table Thi
            WHERE NOT EXISTS (
                SELECT 1
                FROM Hold_Table Hi 
                WHERE Thi.Source = Hi.Source
            )
            AND Thi.Source <> 'NOT FOUND'
            AND LEN(TRIM(ISNULL(Thi.Source, ''))) > 0
        ) T1
    )  

To :

INSERT INTO Process (Hold, GId, Source, Type, Operation)
SELECT Hold, GID, [Source], [Type], 'ADD' Operation 
FROM Temp_Table Thi
    WHERE NOT EXISTS (
        SELECT 1
        FROM Hold_Table Hi 
        WHERE Thi.Source = Hi.Source
    )
    AND Thi.Source <> 'NOT FOUND'
    AND LEN(TRIM(ISNULL(Thi.Source, ''))) > 0

To exclude insert of duplicates, you can add the target table in the SQL in the NOT EXISTS :

INSERT INTO Process (Hold, GId, Source, Type, Operation)
SELECT Hold, GID, [Source], [Type], 'ADD' Operation 
FROM Temp_Table Thi
WHERE 
NOT EXISTS (
    SELECT 1
    FROM Hold_Table Hi 
    WHERE Thi.Source = Hi.Source
)
AND Thi.Source <> 'NOT FOUND'
AND LEN(TRIM(ISNULL(Thi.Source, ''))) > 0
AND NOT EXISTS (
    SELECT 1 
    FROM Process Pro
    WHERE 
        Thi.Hold = Pro.Hold and 
        Thi.GID = Pro.GID and 
        Thi.[Source] = Pro.[Source] and
        Thi.[Type] = Pro.[Type] 
)
Andre Liu
  • 41
  • 4