Remove duplicate entries in a MySQL table

Question

I have a table with a few thousand rows. The table contains two columns, name and email. I have several duplicate rows, for example:

John Smith | john@smith.com
John Smith | john@smith.com
Erica Smith | erica@smith.com
Erica Smith | erica@smith.com

What would be the easiest way to delete all duplicate results. For example, such that the table's content would = SELECT name, DISTINCT(email) FROM table.

one way to do is using primary key. so you have primary key in your table or just name and email only? — AJP, Mar 26 '12 at 21:27

Umbrella · Accepted Answer · 2012-03-26T21:35:17.373

You could pretty easily do this by selecting that query into another table, then renaming it to replace the original.

CREATE TABLE `table2` (
  `name` varchar(255), 
  `email` varchar(255), 
  UNIQUE KEY `email` (`email`));
INSERT INTO `table2` SELECT `name`, DISTINCT(`email`) FROM `table`;
RENAME TABLE `table` TO `table1`;
RENAME TABLE `table2` TO `table`;

Note that this CREATE should be adjusted to your actual table format. I added the unique key on the email field as a suggestion on how you would prevent duplicates in the first place.

Alternatively, you could loop over this

DELETE FROM `table` 
WHERE `email` IN (
  SELECT `email` FROM `table` GROUP BY `email` HAVING count(*) > 1
) LIMIT 1

Which would delete one duplicate record per call. The importance of the limit is to not remove both rows for any duplicate

score 2 · Answer 2 · answered Mar 26 '12 at 21:25

2

The easiest way would be to copy all distinct values into a new table:

select distinct *
into NewTable
from MyTable

answered Mar 26 '12 at 21:25

keyser

18,829
16
59
101

score 1 · Answer 3 · answered Mar 26 '12 at 21:26

1

DELETE FROM table
WHERE id 
NOT IN
(SELECT A.id
FROM 
(
SELECT name,MAX(id) AS id
FROM table
GROUP BY name
) A
)

answered Mar 26 '12 at 21:26

Teja

13,214
36
93
155

score 1 · Answer 4 · answered Mar 26 '12 at 21:34

Add an auto-increment field to the table. I believe that when you add it, it will be 'backfilled' for you. Since MySql doesn't allow a delete based on a subquery against the same table, the easiest solution is to then dump the whole dataset into a temptable for use in processing. Assuming you called the new field RowId and the temp table tempTable, you can then use code like this:

DELETE FROM NameAndEmail
LEFT JOIN 
(     SELECT name, email, Max(RowId) as MaxRowId 
      FROM temptable 
      GROUP BY name, email
) as MaxId
WHERE NameAndEmail.Email = MaxId.Email
     and NameAndEmail.Name = MaxId.Name
     and NameAndEmail.RowId <> MaxId.RowId

score 1 · Answer 5 · edited Jun 20 '20 at 09:12

1

Add a unique index

The simplest way to clean up a table with duplicate data - is to just add a unique index:

set session old_alter_table=1;
ALTER IGNORE TABLE `table` ADD UNIQUE INDEX (name, email);

Pay particular attention to the first sql statement, without it the IGNORE flag is ignored and the alter table statement will fail with an error.

edited Jun 20 '20 at 09:12

Community

1
1

answered Mar 26 '12 at 21:41

AD7six

63,116
12
91
123

Remove duplicate entries in a MySQL table

5 Answers5

Add a unique index

Linked