1

I have a table with only a primary key column and a text column. The text column has duplicate values and I want those gone.

What have I tried


I googled around a bit and quickly found what I thought was the answer, which was this:
ALTER IGNORE TABLE tablename ADD UNIQUE INDEX index_name (column1);

But after trying to execute the query I ended up with MySQL saying: "#1062 - Duplicate entry 'v&d' for key 'remove_duplicates'". So after fiddeling with that for a while I found that it didn't work.

After that I tried creating a tmp table and removing the old one, but I couldn't get that going either. I may have gotten the syntax wrong which was:

CREATE table `tmp` like `Tag`

alter table tmp add unique (text)

INSERT IGNORE INTO `tmp` SELECT * FROM `Tag`

RENAME TABLE `Tag` TO `deleteme`
RENAME TABLE `tmp` TO `Tag`

DROP TABLE `deleteme`;



What do I want


A table that has no duplicate values for column 'text'. If anyone sees any errors with my previous methods please let me know, or if you think it should/could be done in a different way please let me know!



Edit


I forgot to mention that I also have a relation hanging on the PK (yeah, quite important I know). Is there some way to "preserve" the relation with the other table as well? I could manually change the id's in the other table if need be, but a way to change that as well would be great.
Bono
  • 4,757
  • 6
  • 48
  • 77
  • For a text value that is dupliacted, do you care which PK value you keep? – Declan_K Aug 20 '13 at 14:40
  • 1
    The solution to add the unique index on name is correct, but there is a [known bug in MySql in the InnoDB Version](http://bugs.mysql.com/bug.php?id=40344) that causes this error message. The problem and a work around is listed [in this answer](http://stackoverflow.com/a/8053812/1048425) – GarethD Aug 20 '13 at 14:44
  • Updated my question to reflect "another" problem. – Bono Aug 20 '13 at 14:57

2 Answers2

10

When you insert into the table, remove the duplicates then:

CREATE table `tmp` like `Tag`

alter table tmp add unique (text)

INSERT INTO `tmp` SELECT min(pk), text FROM `Tag` group by text;

RENAME TABLE `Tag` TO `deleteme`
RENAME TABLE `tmp` TO `Tag`

DROP TABLE `deleteme`;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Quick question, does this preserve relationships with other tables? I forgot to mention that. I know that some of them will be deleted but I could change them manually if need be. – Bono Aug 20 '13 at 14:51
  • 1
    This will preserve the relationships based on the minimum primary key. Relationships using the duplicate will not be preserved. – Gordon Linoff Aug 20 '13 at 15:05
  • I was like why is this so hard? why can't you just use delete from table.... then i read the manual. learned something new. thanks. – Praveen Lobo Aug 20 '13 at 16:00
0

This is how i would done it on a pretty large table i assume you have a column "id"

ALTER TABLE Tag ADD UNIQUE INDEX text_id (text, id);

Create a UNIQUE INDEX on column1. id so the next querys should run faster.

Then if you would like to know how many duplicates you had in the table Tag

SELECT COUNT(*) as "total" - COUNT(DISTINCT text) as "duplicates text" FROM Tag

To get the first row that is UNIQUE you should run (if FIFO is imporant)

INSERT INTO `tmp` SELECT MIN(id), text FROM `Tag` GROUP BY text (Gordon Linoff query)

To get the last row that is UNIQUE you should run (if LIFO is imporant)

INSERT INTO `tmp` SELECT MAX(id), text FROM `Tag` GROUP BY text 

Because off the covering index the copy should be pretty quick, if the server doesn't need to create a disk temporary table...

RENAME TABLE `Tag` TO `deleteme`
RENAME TABLE `tmp` TO `Tag`

DROP TABLE `deleteme`;
Raymond Nijland
  • 11,488
  • 2
  • 22
  • 34