Remove Duplicate Rows MySQL

Question

I have been trying to delete duplicate rows from a table but all my efforts either result in error or get stuck during execution. My Table has 16.8 million records including 1.5 million duplicates. Table structure is as follows

--------------------------------------
| id | number | city | region | site |
--------------------------------------
| 1  | 12345  | abc  | xyz    | 321  |
| 2  | 67890  | def  | axc    | 167  |
| 3  | 12345  | abc  | xyz    | 321  |
| 4  | 13400  | fff  | aaa    | 301  |
--------------------------------------

I have tried using some of the approaches suggested in answers here at stack overflow but couldn't find a solution that worked for me.

DELETE n1 FROM data n1, data n2 WHERE n1.id > n2.id AND n1.number = n2.number

Didn't work so I tried following:

DELETE FROM data where data.number in 
(
    SELECT number from data GROUP BY number HAVING COUNT(*)>1
)
LIMIT 1

No use here either so I am stuck. All sorts of suggestions are welcome.

THE SOLUTION THAT WORKED FOR ME

Marc-B marked the post as duplicate of stackoverflow.com/a/3312066/1528290 tried that approach and it worked like a charm. my query was :

alter ignore table data add unique i_number (number)

http://stackoverflow.com/questions/30401571/how-to-remove-duplicate-row-considering-the-arabic-phonetics/30402156#30402156 — Uueerdo, May 28 '15 at 17:48
@B-Abbasi .. Do you wish to delete all the occurrences of duplicate rows ? Or do you wish to keep a single row for each set of duplicate rows ? — DfrDkn, May 28 '15 at 17:48
Why did the `DELETE n1 FROM data n1, data n2 WHERE n1.id > n2.id AND n1.number = n2.number` not work? What error was given? — johnjps111, May 28 '15 at 17:52
You haven't actually defined what a duplicate record would be. The same number in any record? The same number for a city, region, site combination? — AndySavage, May 28 '15 at 17:54
dupes across columns 2,3,4,5, keeping min of id ? ie: keep id=1 kill id=3 in your dataset ? — Drew, May 28 '15 at 18:00
@AndySavage duplicate rows have same value in all columns except ID which is a primary key. — B-Abbasi, May 28 '15 at 18:01

score 0 · Answer 1 · edited May 28 '15 at 19:53

0

Assuming that the duplication is done on the number column. Try this:

DELETE FROM data 
 WHERE data.number NOT IN (SELECT * 
                    FROM (SELECT MAX(data.id)
                            FROM data n
                        GROUP BY data.number) x)

This will keep one record(which has the highest number) and delete the rest of the records in your table.

EDIT:

I just checked with your query and it worked for me:

DELETE n1 FROM foobarred n1, foobarred n2 
WHERE n1.id > n2.id AND n1.number = n2.number;

SQLFIDDLE DEMO

I guess you should follow Drew's comment: on mysql workbench you have to close database, go to Edit / Preferences / SQL Editor / and @ bottom clear Safe Updates re-connect to server, select db, and fire it off above pastie

Marc-B marked the post as duplicate of stackoverflow.com/a/3312066/1528290 tried this approach and it worked like a charm. my query was :

alter ignore table data add unique i_number (number)

edited May 28 '15 at 19:53

B-Abbasi

813
2
17
38

answered May 28 '15 at 17:52

Rahul Tripathi

168,305
31
280
331

O yes I totally forgot to write about this,I tried it and it deleted all the records from the table. Not sure why. – B-Abbasi May 28 '15 at 17:56
Duplicate rows are exact replicas of each other except ID which is the Primary Key – B-Abbasi May 28 '15 at 17:58
@B-Abbasi:- Updated my answer. Can you try now? – Rahul Tripathi May 28 '15 at 17:59
check this out: http://pastie.org/10212389 – Drew May 28 '15 at 18:37
on mysql workbench you have to close database, go to Edit / Preferences / SQL Editor / and @ bottom clear Safe Updates re-connect to server, select db, and fire it off above pastie – Drew May 28 '15 at 18:42
@B-Abbasi:- Updated the answer! – Rahul Tripathi May 28 '15 at 18:58
Marc-B marked the post as duplicate of http://stackoverflow.com/a/3312066/1528290 tried this approach and it worked like a charm. my query was : alter ignore table data add unique i_number (number) – B-Abbasi May 28 '15 at 19:36
@B-Abbasi:- Ok great. You can proceed with that. Happy coding. If you want then you can paste that as answer or else edit mine and accept that as answer! – Rahul Tripathi May 28 '15 at 19:40
I have tried your query and it is still executing, it is longer process than the one that worked for me. Anyway thanks for the help. – B-Abbasi May 28 '15 at 19:50
@B-Abbasi:- You are welcome! – Rahul Tripathi May 28 '15 at 19:51
@RahulTripathi query from your answer executed without an error and deleted all the records once again. so I have unmarked its as not the answer – B-Abbasi May 28 '15 at 20:02

Remove Duplicate Rows MySQL

1 Answers1