50

I'm trying to remove duplicates from a MySQL table using ALTER IGNORE TABLE + an UNIQUE KEY. The MySQL documentation says:

IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.

When I run the query ...

ALTER IGNORE TABLE table ADD UNIQUE INDEX dupidx (field)

... I still get the error #1062 - Duplicate entry 'blabla' for key 'dupidx'.

BenMorel
  • 34,448
  • 50
  • 182
  • 322
Philippe Gerber
  • 17,457
  • 6
  • 45
  • 40

3 Answers3

98

The IGNORE keyword extension to MySQL seems to have a bug in the InnoDB version on some version of MySQL.

You could always, convert to MyISAM, IGNORE-ADD the index and then convert back to InnoDB

ALTER TABLE table ENGINE MyISAM;
ALTER IGNORE TABLE table ADD UNIQUE INDEX dupidx (field);
ALTER TABLE table ENGINE InnoDB;

Note, if you have Foreign Key constraints this will not work, you will have to remove those first, and add them back later.

Paronity
  • 372
  • 2
  • 12
Emma
  • 6,112
  • 2
  • 18
  • 11
  • 56
    In that link to the InnoDB bug there is a suggested workaround to first run `set session old_alter_table=1;` This worked for me. – Peter May 02 '12 at 03:26
  • 1
    Thanks Peter - this seems to be working for me now. Had no idea of this problem - my dev machine is mariadb but when I had to run on production (mysql 5.5) ran into this. This stackoverflow saved my day! – spidie Mar 05 '13 at 03:18
  • 6
    This is quite possibly the worst accepted answer I've ever seen on Stack Overflow. Changing storage engines is a significant undertaking in itself with even medium-sized tables. These three queries could potentially lock up a database server for hours. This is nothing like a solution. – Mikkel Feb 04 '16 at 17:03
  • I think I was a little bit sick just then – John Hunt May 26 '17 at 14:57
28

Or try set session old_alter_table=1 (Don't forget to set it back!)

See: http://mysqlolyk.wordpress.com/2012/02/18/alter-ignore-table-add-index-always-give-errors/

Jay Julian Payne
  • 782
  • 1
  • 6
  • 12
  • 2
    This did work for me. It's lengthy on big tables, but it seems fairly linear. On my machine, it was able to process roughly 2GiB of data per hour, which was about two days. I wonder how that compares to the accepted solution, which was to convert to myISAM, add the index and convert back. – Chris Strickland Nov 25 '14 at 17:13
  • Beware! If you're using replication, the `old_alter_table` setting doesn't replicate, so the `ALTER TABLE IGNORE` will fail on the slave and break replication. To fix this, I performed the `ALTER` manually on the slave, then skipped the offending `ALTER TABLE` by using `SET GLOBAL sql_slave_skip_counter = 1`, then resumed replication. – thenickdude Oct 12 '15 at 05:35
3

The problem is that you have duplicate data in the field you're trying to index. You'll need to remove the offending duplicates before you can add a unique index.

One way is to do the following:

   CREATE TABLE tmp_table LIKE table;
   ALTER IGNORE TABLE tmp_table ADD UNIQUE INDEX dupidx (field);
   INSERT IGNORE INTO tmp_table SELECT * FROM table;
   DROP TABLE table;
   RENAME TABLE tmp_table TO table;

this allows you to insert only the unique data into the table

Glen Solsberry
  • 11,960
  • 15
  • 69
  • 94
  • 2
    No, the IGNORE keyword should take care of those duplicates. That would be the beauty of this solution. See the cited docs in my question. – Philippe Gerber Nov 08 '11 at 19:11