Deduplicating linking tables in Mysql

Question

I have inherited a table that provides a link between two other tables that looks like this:

id | product_id | category_id
1 | 10 | 20 
2 | 10 | 21
3 | 34 | 20
4 | 34 | 21
5 | 34 | 21
6 | 10 | 21

In there, it turns out that row 2 and 6, and rows 4 and 5 are pairs.

How can I write a query to delete the redundant rows?

Did you have a look at these links http://stackoverflow.com/questions/3311903/remove-duplicate-rows-in-mysql http://stackoverflow.com/questions/2867530/how-to-remove-duplicate-entries-from-a-mysql-db — Ajeesh, Sep 28 '13 at 11:09

score 1 · Accepted Answer · edited Sep 28 '13 at 11:07

1

You should add a unique key over product_id and category_id, then it will not longer be possible to add redundant rows, as any insert will fail then.

However you can only do that if we remove the redundant rows first. You can find them by grouping like that:

 SELECT id FROM table GROUP BY product_id, category_id HAVING COUNT(*) > 1

After that you can do take the ids from the result separate them by ',' and run this delete query:

DELETE FROM table WHERE id IN({comma separated list of id})

edited Sep 28 '13 at 11:07

BenMorel

34,448
50
182
322

answered Sep 28 '13 at 11:04

Justus Krapp

1,116
9
10

This will work. However, depending on the number of duplicates, `IN()` might not be the best solution. – BenMorel Sep 28 '13 at 11:10

score 1 · Answer 2 · edited May 23 '17 at 12:13

1

ALTER IGNORE TABLE 'tablename' ADD UNIQUE INDEX idx_name (id, product_id, category_id);

This will also ensure there are no more duplicates henceforth in the table.

For more info: Remove duplicate rows in MySQL

edited May 23 '17 at 12:13

Community

1
1

answered Sep 28 '13 at 11:08

aaron

697
4
11

1

This will fail because of the duplicate rows, which have to be removed first! – BenMorel Sep 28 '13 at 11:09
Sorry, missed the `IGNORE` keyword, works indeed! Never thought of this solution before. +1 – BenMorel Sep 28 '13 at 11:22
@Benjamin: yup, I remember implementing this function in one of my recent projects and it worked. Its one of the best solutions i have come across too. – aaron Sep 28 '13 at 11:26

BenMorel · Answer 3 · 2013-09-28T11:23:29.853

1

Create a temporary table that will hold the deduplicated values:

CREATE TABLE y LIKE x;

Insert the deduplicated values:

INSERT INTO y (product_id, category_id)
SELECT product_id, category_id FROM x
GROUP BY product_id, category_id;

Swap the tables:

DROP TABLE x;
ALTER TABLE y RENAME x;

Now you can add a unique key on (product_id, category_id) to prevent further duplicates to be inserted.

edited Sep 28 '13 at 11:23

answered Sep 28 '13 at 11:17

BenMorel

34,448
50
182
322

Deduplicating linking tables in Mysql

3 Answers3