0

I am wondering if there is a way to do this through one query.

Seems when I was initially populating my DB with dummy data to work with 10k records, somewhere in the mess of it all the script dummped an extra 1,044 rows where the rows are duplicates. I determined this using

SELECT x.ID, x.firstname FROM info x
INNER JOIN (SELECT ID FROM info
GROUP BY ID HAVING count(id) > 1) d ON x.ID = d.ID

What I am trying to figure out is through this single query can I add another piece to it that will remove one of the matching dupes from each dupe found?

also I realize the ID column should have been set to auto increment, but it wasn't

chris
  • 36,115
  • 52
  • 143
  • 252
  • 2
    possible duplicate of [how to delete duplicates on mysql table?](http://stackoverflow.com/questions/2630440/how-to-delete-duplicates-on-mysql-table) – dee-see Feb 18 '12 at 01:05
  • possible duplicate of [MySQL remove duplicates from big database quick](http://stackoverflow.com/questions/1651999/mysql-remove-duplicates-from-big-database-quick) – Book Of Zeus Feb 18 '12 at 01:07
  • You would want to be very careful not to delete both of a pair. Is there anything else distinct about the dups? Do you have any unique fields, like an auto_increment column, that could be used to isolate them? – Umbrella Feb 18 '12 at 01:07
  • Only on some of them theres distinct other values, some of them just seem to be exact dupes – chris Feb 18 '12 at 01:10

3 Answers3

4

My favorite way of removing duplicates would be:

ALTER IGNORE TABLE info ADD UNIQUE (ID);

To explain a bit further (for reference, take a look here)

UNIQUE - you are adding unique index to ID column.

IGNORE - is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.

Aleksandar Vucetic
  • 14,715
  • 9
  • 53
  • 56
  • Could you explain what this does a little further, ultimately I dont want to delete or change anything til I am sure things are good with the table – chris Feb 18 '12 at 01:52
0

The most efficient way is you do it in below steps:

Step 1: Move the non duplicates (unique tuples) into a temporary table

CREATE TABLE new_table as
SELECT * FROM old_table WHERE 1 GROUP BY [column to remove duplicates by];

Step 2: delete delete the old table.We no longer need the table with all the duplicate entries, so drop it!

DROP TABLE old_table;

Step 3: rename the new_table to the name of the old_table

RENAME TABLE new_table TO old_table;
0

The query that I use is generally something like

Delete from table where id in (
   Select Max(id) from table
    Group by (DUPFIELD)
     Having count (*)>1)

You have to run this several times since it all only remove one duplicated row at a time, but it's fast.

Brian Hoover
  • 7,861
  • 2
  • 28
  • 41