1

I have the following SQL to delete duplicate values form a table,

DELETE p1 
FROM `ProgramsList` p1, `ProgramsList` p2  
WHERE p1.CustId = p2.CustId 
    AND p1.CustId = 1 
    AND p1.`Id`>p2.`Id` 
    AND p1.`ProgramName` = p2.`ProgramName`;

Id is auto incremental
for a given CustId ProgramName must be unique (currently it is not)
The above SQL takes about 4 to 5 hours to complete with about 1,000,000 records

Could anyone suggest a quicker way of deleting duplicates from a table?

Dharman
  • 30,962
  • 25
  • 85
  • 135
  • possible duplicate of [Deleting duplicate rows from a table](http://stackoverflow.com/questions/1043488/deleting-duplicate-rows-from-a-table) – John Conde Feb 11 '13 at 16:55
  • Maybe inserting the unique values in a temp array like this? http://stackoverflow.com/a/1652413/217180 – Xavi Esteve Feb 11 '13 at 16:56

2 Answers2

1

First, You might try adding indexes to ProgramName and CustID fields if you don't already have them.

De-Duping

You can group your records to identify dupes, and as you are doing that, grab the min ID value for each group. Then, just delete all records whose ID is not one of the MinID's.

In-Clause Method

delete from
 ProgramsList
where
 id not in 
    (select min(id) as MinID
      from ProgramsList
      group by ProgramName, CustID) 

Join-Method

You may have to run this more than once, if there are many members per group.

DELETE P
FROM ProgramsList as P
INNER JOIN 
    (select count(*) as Count, max(id) as MaxID
     from ProgramsList
     group by ProgramName, CustID) as A on A.MaxID = P.id
WHERE A.Count >= 2

Some people have performance issues with the In-Clause, some don't. It depends a lot on your indexes and such. If one is too slow, try the other.

Related: https://stackoverflow.com/a/4192849/127880

Community
  • 1
  • 1
Brian Webster
  • 30,033
  • 48
  • 152
  • 225
0

This will remove all the duplicates in one go.

From the inner query an ID is got which is not deleted and the rest is deleted for each of the program.

delete p from ProgramsList as p
INNER JOIN (select ProgramName as Pname, max(id) as MaxID
     from ProgramsList
     group by ProgramName, CustID order by null) as A on  Pname=P.ProgramName
    where A.MaxID != P.id
kleopatra
  • 51,061
  • 28
  • 99
  • 211