I have a database table that was generated by importing several thousand text documents each very large. For some reason, some files were imported multiple times.
I am trying to remove duplicate rows by using following query:
ALTER IGNORE TABLE mytable ADD UNIQUE INDEX myindex (LASTNAME, FIRSTNAME, HOUSENUMBER, STREET, CITY, ZIP, DOB, SEX);
but I was getting an error
1062 - Duplicate entry
Apparently, IGNORE
has been deprecated.
How can I remove duplicates from my database?
I guess I have to do a DELETE
with a JOIN
but I can't figure out the code.
The table is InnoDB
and currently has about 40,000,000 rows (there should be about 17,000,000). Each row has a primary key.
Considering the size, I am hesitant to temporally change the table to MyISAM.