0

I am creating a new unique index of two fields:

ALTER TABLE `tools`
ADD UNIQUE INDEX `name_url` (`name`, `url`);

The problem is that the db has already 500000 tools and other tables refer to it.What's the best strategy to remove the doubles?

maugch
  • 1,276
  • 3
  • 22
  • 46
  • 1
    You mean remove duplicates? – Barmar Aug 25 '14 at 20:42
  • @Barmar yes, sorry I didn't know how to say it. But it's the whole process that I'm interested, not for the single table alone. – maugch Aug 25 '14 at 20:47
  • What other process? Do you mean you need to fix all the foreign keys that refer to the rows that will be removed? – Barmar Aug 25 '14 at 20:47
  • @Barmar yes. I need to know which tools.id get dropped so that I can change the keys on other tables and drop others. – maugch Aug 25 '14 at 20:54

1 Answers1

1

To update another table that refers to this table, do:

UPDATE otherTable AS o
JOIN tools AS t1 on o.tools_id = t1.id
JOIN (SELECT name, url, MIN(id) AS minid
      FROM tools
      GROUP BY name, url) AS t2 
ON t1.name = t2.name AND t1.url = t2.url AND t1.id != t2.minid
SET o.tools_id = t2.minid

This sets all foreign keys to point to the minimum ID for each (name, url) pair. You'll have to do this for every other table that has foreign keys pointing to tools.

Then delete all the extra rows:

DELETE t1.* FROM tools AS t1
JOIN (SELECT name, url, MIN(id) AS minid
      FROM tools
      GROUP BY name, url) AS t2 
ON t1.name = t2.name AND t1.url = t2.url AND t1.id != t2.minid

Once you've done all this you can add the unique index.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • thanks. That will solve one part of my problems. I have some foreign keys that have to be deleted, but I suppose I could do it by using the second query for that. – maugch Aug 26 '14 at 07:10