0

I have been trying to delete duplicate rows from a table but all my efforts either result in error or get stuck during execution. My Table has 16.8 million records including 1.5 million duplicates. Table structure is as follows

--------------------------------------
| id | number | city | region | site |
--------------------------------------
| 1  | 12345  | abc  | xyz    | 321  |
| 2  | 67890  | def  | axc    | 167  |
| 3  | 12345  | abc  | xyz    | 321  |
| 4  | 13400  | fff  | aaa    | 301  |
--------------------------------------

I have tried using some of the approaches suggested in answers here at stack overflow but couldn't find a solution that worked for me.

DELETE n1 FROM data n1, data n2 WHERE n1.id > n2.id AND n1.number = n2.number

Didn't work so I tried following:

DELETE FROM data where data.number in 
(
    SELECT number from data GROUP BY number HAVING COUNT(*)>1
)
LIMIT 1

No use here either so I am stuck. All sorts of suggestions are welcome.

THE SOLUTION THAT WORKED FOR ME

Marc-B marked the post as duplicate of stackoverflow.com/a/3312066/1528290 tried that approach and it worked like a charm. my query was :

alter ignore table data add unique i_number (number)
B-Abbasi
  • 813
  • 2
  • 17
  • 38
  • http://stackoverflow.com/questions/30401571/how-to-remove-duplicate-row-considering-the-arabic-phonetics/30402156#30402156 – Uueerdo May 28 '15 at 17:48
  • @B-Abbasi .. Do you wish to delete all the occurrences of duplicate rows ? Or do you wish to keep a single row for each set of duplicate rows ? – DfrDkn May 28 '15 at 17:48
  • Why did the `DELETE n1 FROM data n1, data n2 WHERE n1.id > n2.id AND n1.number = n2.number` not work? What error was given? – johnjps111 May 28 '15 at 17:52
  • You haven't actually defined what a duplicate record would be. The same number in any record? The same number for a city, region, site combination? – AndySavage May 28 '15 at 17:54
  • @DfrDkn I want to keep single row for each duplicate – B-Abbasi May 28 '15 at 17:59
  • dupes across columns 2,3,4,5, keeping min of id ? ie: keep id=1 kill id=3 in your dataset ? – Drew May 28 '15 at 18:00
  • @AndySavage duplicate rows have same value in all columns except ID which is a primary key. – B-Abbasi May 28 '15 at 18:01
  • @JohnnyStrings the query got stuck, had to kill the query – B-Abbasi May 28 '15 at 19:33

1 Answers1

0

Assuming that the duplication is done on the number column. Try this:

DELETE FROM data 
 WHERE data.number NOT IN (SELECT * 
                    FROM (SELECT MAX(data.id)
                            FROM data n
                        GROUP BY data.number) x)

This will keep one record(which has the highest number) and delete the rest of the records in your table.

EDIT:

I just checked with your query and it worked for me:

DELETE n1 FROM foobarred n1, foobarred n2 
WHERE n1.id > n2.id AND n1.number = n2.number;

SQLFIDDLE DEMO

I guess you should follow Drew's comment: on mysql workbench you have to close database, go to Edit / Preferences / SQL Editor / and @ bottom clear Safe Updates re-connect to server, select db, and fire it off above pastie

Marc-B marked the post as duplicate of stackoverflow.com/a/3312066/1528290 tried this approach and it worked like a charm. my query was :

alter ignore table data add unique i_number (number)
B-Abbasi
  • 813
  • 2
  • 17
  • 38
Rahul Tripathi
  • 168,305
  • 31
  • 280
  • 331