I have been trying to delete duplicate rows from a table but all my efforts either result in error or get stuck during execution. My Table has 16.8 million records including 1.5 million duplicates. Table structure is as follows
--------------------------------------
| id | number | city | region | site |
--------------------------------------
| 1 | 12345 | abc | xyz | 321 |
| 2 | 67890 | def | axc | 167 |
| 3 | 12345 | abc | xyz | 321 |
| 4 | 13400 | fff | aaa | 301 |
--------------------------------------
I have tried using some of the approaches suggested in answers here at stack overflow but couldn't find a solution that worked for me.
DELETE n1 FROM data n1, data n2 WHERE n1.id > n2.id AND n1.number = n2.number
Didn't work so I tried following:
DELETE FROM data where data.number in
(
SELECT number from data GROUP BY number HAVING COUNT(*)>1
)
LIMIT 1
No use here either so I am stuck. All sorts of suggestions are welcome.
THE SOLUTION THAT WORKED FOR ME
Marc-B marked the post as duplicate of stackoverflow.com/a/3312066/1528290 tried that approach and it worked like a charm. my query was :
alter ignore table data add unique i_number (number)