Duplicate entries with different timestamp

Question

I have a Table for Customers by name : Customer_SCD in SQL I have 3 Columns present in it : Customer_Name, Customer_ID Customer_TimeStamp

There are duplicate entries in this table with different Timestamp.

For example

ABC, 1, 2012-12-05 11:58:20.370

ABC, 1, 2012-12-03 12:11:09.840

I want to eliminate this from the database and keep the first time/date available.

Thanks.

Once you've tidied up, remember to add a unique constraint (or maybe multiple ones) to the table so that you don't have to do this job again in 6 months time. — Damien_The_Unbeliever, Dec 18 '12 at 07:25

Ivan Golović · Accepted Answer · 2012-12-18T07:29:25.833

This works, try it:

DELETE  Customer_SCD
OUTPUT  deleted.*
FROM    Customer_SCD b
JOIN    (
    SELECT  MIN(a.Customer_TimeStamp) Customer_TimeStamp,
            Customer_ID,
            Customer_Name
    FROM    Customer_SCD a
    GROUP   BY a.Customer_ID, a.Customer_Name
) c ON 
    c.Customer_ID = b.Customer_ID
AND c.Customer_Name = b.Customer_Name
AND c.Customer_TimeStamp <> b.Customer_TimeStamp

In a subquery it determines which record is the first one for every Customer_Name,Customer_ID and then it deletes all the other records for a duplicate. I also added the OUTPUT clause which returns rows affected by the statement.

You could also do it by using ranking function ROW_NUMBER:

DELETE  Customer_SCD
OUTPUT  deleted.*
FROM    Customer_SCD b
JOIN    (
    SELECT  Customer_ID,
            Customer_Name,
            Customer_TimeStamp,
            ROW_NUMBER() OVER (PARTITION BY Customer_ID, Customer_Name ORDER BY Customer_TimeStamp) num
    FROM    Customer_SCD
) c ON 
    c.Customer_ID = b.Customer_ID
AND c.Customer_Name = b.Customer_Name
AND c.Customer_TimeStamp = b.Customer_TimeStamp
AND c.num <> 1

See which one has a smaller query cost and use it, when I checked it, first approach was more efficient (it had a better execution plan).

Here's an SQL Fiddle

score 0 · Answer 2 · answered Dec 18 '12 at 07:09

The following query will give you the results you want to keep.

Select Customer_Name, Customer_ID, MIN(Customer_TimeStamp) as Customer_TimeStamp
from Customer_SCD 
group by Customer_Name, Customer_ID

store the result in a table variable, say @correctTbl

then join with this table and remove duplicates

delete 
from Customer_SCD a
inner join @correctTbl b on a.Customer_Name = b.Customer_Name and a.Customer_ID = b.Customer_ID and a.Customer_TimeStamp != b.Customer_TimeStamp

Duplicate entries with different timestamp

2 Answers2

Linked