1

I'd like to select all records from a table (names) where lastname is not unique. Preferably I would like to delete all records that are duplicates.

How would this be done? Assume that I don't want to rerun one query multiple times until it quits.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Josh K
  • 28,364
  • 20
  • 86
  • 132

4 Answers4

3

To find which lastnames have duplicates:

  SELECT lastname, COUNT(lastname) AS rowcount 
    FROM table 
GROUP BY lastname 
  HAVING rowcount > 1

To delete one of the duplicates of all the last names. Run until it doesn't do anything. Not very graceful.

DELETE FROM table 
 WHERE id IN (SELECT id 
                FROM (SELECT * FROM table) AS t 
            GROUP BY lastname 
              HAVING COUNT(lastname) > 1)
OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
Edward Dale
  • 29,597
  • 13
  • 90
  • 129
2

The fastest and easiest way to delete duplicate records is my issuing a very simple command.

ALTER IGNORE TABLE [TABLENAME] ADD UNIQUE INDEX UNIQUE_INDEX ([FIELDNAME])

This will lock the table, if this is an issue, try:

delete t1 from table1 t1, table2 t2
where table1.duplicate_field= table2.duplicate_field (add more if need ie. and table.duplicate_field2=table2.duplicate_field2)
and table1.unique_field > table2.unique_field
and breakup into ranges to run faster

Gary
  • 2,866
  • 1
  • 17
  • 20
  • Locking the table isn't an issue. The issue is there already duplicate rows. – Josh K Jun 09 '10 at 14:38
  • If locking is not an issue, then executing ALTER IGNORE TABLE [TABLENAME] ADD UNIQUE INDEX UNIQUE_INDEX ([FIELDNAME]) will rebuild the table and remove the duplicate records. – Gary Jun 09 '10 at 15:07
  • 1
    You can't apply a constraint if the data doesn't satisfy it - your suggestion would not work. – OMG Ponies Jun 09 '10 at 15:47
  • +1 and accepted. Locked the table temporarily and went to work. No duplicates and no more will be added. – Josh K Jun 09 '10 at 15:58
  • OMG, it does work. The IGNORE is the key part of what you are missing. – Gary Jun 09 '10 at 16:00
0

dup How can I remove duplicate rows?

DELETE names
FROM names
LEFT OUTER JOIN (
   SELECT MIN(RowId) as RowId, lastname 
   FROM names
   GROUP BY lastname 
) as KeepRows ON
   names.lastname = KeepRows.lastname 
WHERE
   KeepRows.RowId IS NULL

assumption: you have an RowId column

Community
  • 1
  • 1
Glennular
  • 17,827
  • 9
  • 58
  • 77
0
SELECT COUNT(*) as mycountvar FROM names GROUP BY lastname WHERE mycountvar > 1;

and then

DELETE FROM names WHERE lastname = '$mylastnamevar' LIMIT $mycountvar-1

but: why don't you just flag the fielt "lastname" als unique, so it isn't possible that duplicates can come in?

oezi
  • 51,017
  • 10
  • 98
  • 115
  • Because duplicates are already in the table. I'm trying to add `lastname` as a `UNIQUE INDEX`. – Josh K Jun 09 '10 at 14:30