2

So I have a situation here where I need to delete duplicate records. The Scenario here is I have 5 key fields. 4 fields are the actual data and 5th one is the auto generated unique number. So when I check for counts using group by using only these 4 fields, I have duplicates of around 400 records. But unique number for these duplicate records are different(we can delete the record which has max value for unique number). Is there a way to delete these duplicate records ? Also these are 6 other non primary key fields but they are just for reference sake.

Say for example :

+----------+-------+---------+---------+-----------+
| Field    | Field | Field 3 | Field 4 | Unique,Id |
|   1      |   2   |         |         |           |
+----------+-------+---------+---------+-----------+
| Batman   | 100   | 50      | Chicago | 100       |
+----------+-------+---------+---------+-----------+
| Batman   | 100   | 50      | Chicago | 101       |
+----------+-------+---------+---------+-----------+
| Superman | 50    | 50      | Florida | 102       |
+----------+-------+---------+---------+-----------+
| Flash    | 90    | 40      | London  | 103       |
+----------+-------+---------+---------+-----------+
| Flash    | 90    | 40      | London  | 104       |
+----------+-------+---------+---------+-----------+

In this example, the first 4 fields are actual data fields.

So you can see Batman and flash have duplicate values(for first 4 fields) but the unique id is different. So here I need to delete one record from each so that they don't remain as duplicates. The maximum value of unique id for that duplicate record can be deleted. So 102 and 104 is the max unique id value for Batman and Flash and they should be deleted.

Please help me on this.

Giorgos Betsos
  • 71,379
  • 9
  • 63
  • 98
Parsu
  • 83
  • 5

2 Answers2

0

Please try this query:

DELETE 
FROM 
    MYTABLE 
FROM 
    MYTABLE
LEFT JOIN
(
SELECT 
    FIELD1,FIELD2,FIELD3,FIELD4,MIN(ID) AS ID
FROM
    MYTABLE
GROUP BY 
    FIELD1,FIELD2,FIELD3,FIELD4
)TbMain
ON 
    MYTABLE.FIELD1=TbMain.FIELD1
    AND
    MYTABLE.FIELD2=TbMain.FIELD2
    AND
    MYTABLE.FIELD3=TbMain.FIELD3
    AND
    MYTABLE.ID =TbMain.ID
WHERE
    TbMain.ID IS NULL
Mojtaba Rezaie
  • 98
  • 1
  • 10
-1

You can do it with a CTE.

WITH cte as (
select
 ROW_NUMBER() OVER ( PARTITION BY FIELD1 ORDER BY ID) FirstOcurrence,
*
from mySuperHeroTable 
)
delete from cte where FirstOcurrence <> 1

EDIT:

I would suggested that you add a Unique constraint on Field1 after you run the delete. This way you won't run into this situation again.

ALTER TABLE mySuperHeroTable ADD CONSTRAINT UQ_mySuperHeroTable_Field1 UNIQUE (Field1);
mxix
  • 3,539
  • 1
  • 16
  • 23
  • Shouldn't we partition by all four fields? – Akanksha Singh May 20 '16 at 10:23
  • This will only keep the first Batman Row and the first flash row (order by ID (ASC)). And ofcourse the Superman row which is unique. – mxix May 20 '16 at 11:15
  • I understand that. Actually ,reading the question, I thought may be there can more than 1 rows with Field 1 Batman but different values in Field 2,3 &4. And we might want to keep them, as in a whole, row would not be duplicate. But anyway, looks like this is not the case here. Parsu wants only 1 row with Batman, irrespective of values in Field 2,3 & 4. – Akanksha Singh May 20 '16 at 11:58