Select a record that has a duplicate

Question

I'd like to select all records from a table (names) where lastname is not unique. Preferably I would like to delete all records that are duplicates.

How would this be done? Assume that I don't want to rerun one query multiple times until it quits.

When deleting duplicates, is there a particular record you wish to keep? IE: The one with lowest ID value, etc? — OMG Ponies, Jun 09 '10 at 15:53

score 3 · Answer 1 · edited Jun 09 '10 at 15:49

3

To find which lastnames have duplicates:

  SELECT lastname, COUNT(lastname) AS rowcount 
    FROM table 
GROUP BY lastname 
  HAVING rowcount > 1

To delete one of the duplicates of all the last names. Run until it doesn't do anything. Not very graceful.

DELETE FROM table 
 WHERE id IN (SELECT id 
                FROM (SELECT * FROM table) AS t 
            GROUP BY lastname 
              HAVING COUNT(lastname) > 1)

edited Jun 09 '10 at 15:49

OMG Ponies

325,700
82
523
502

answered Jun 09 '10 at 13:58

Edward Dale

29,597
13
90
129

Now write that as a delete please. :) – Josh K Jun 09 '10 at 14:42
I'm tempted to downvote simply because of the crappy second query. Surely there must be a simpler way then to re-run a query until it stops. – Josh K Jun 09 '10 at 15:38

score 2 · Accepted Answer · answered Jun 09 '10 at 14:20

2

The fastest and easiest way to delete duplicate records is my issuing a very simple command.

ALTER IGNORE TABLE [TABLENAME] ADD UNIQUE INDEX UNIQUE_INDEX ([FIELDNAME])

This will lock the table, if this is an issue, try:

delete t1 from table1 t1, table2 t2
where table1.duplicate_field= table2.duplicate_field (add more if need ie. and table.duplicate_field2=table2.duplicate_field2)
and table1.unique_field > table2.unique_field
and breakup into ranges to run faster

answered Jun 09 '10 at 14:20

Gary

2,866
1
17
20

Locking the table isn't an issue. The issue is there already duplicate rows. – Josh K Jun 09 '10 at 14:38
If locking is not an issue, then executing ALTER IGNORE TABLE [TABLENAME] ADD UNIQUE INDEX UNIQUE_INDEX ([FIELDNAME]) will rebuild the table and remove the duplicate records. – Gary Jun 09 '10 at 15:07
1

You can't apply a constraint if the data doesn't satisfy it - your suggestion would not work. – OMG Ponies Jun 09 '10 at 15:47
+1 and accepted. Locked the table temporarily and went to work. No duplicates and no more will be added. – Josh K Jun 09 '10 at 15:58
OMG, it does work. The IGNORE is the key part of what you are missing. – Gary Jun 09 '10 at 16:00

score 0 · Answer 3 · edited May 23 '17 at 12:01

0

dup How can I remove duplicate rows?

DELETE names
FROM names
LEFT OUTER JOIN (
   SELECT MIN(RowId) as RowId, lastname 
   FROM names
   GROUP BY lastname 
) as KeepRows ON
   names.lastname = KeepRows.lastname 
WHERE
   KeepRows.RowId IS NULL

assumption: you have an RowId column

edited May 23 '17 at 12:01

Community

1
1

answered Jun 09 '10 at 14:00

Glennular

17,827
9
58
77

score 0 · Answer 4 · answered Jun 09 '10 at 14:01

0

SELECT COUNT(*) as mycountvar FROM names GROUP BY lastname WHERE mycountvar > 1;

and then

DELETE FROM names WHERE lastname = '$mylastnamevar' LIMIT $mycountvar-1

but: why don't you just flag the fielt "lastname" als unique, so it isn't possible that duplicates can come in?

answered Jun 09 '10 at 14:01

oezi

51,017
10
98
115

Because duplicates are already in the table. I'm trying to add `lastname` as a `UNIQUE INDEX`. – Josh K Jun 09 '10 at 14:30

Select a record that has a duplicate

4 Answers4