Delete all the duplicates except one

Question

We have a table business_users with a user_id and business_id and we have duplicates. How can I write a query that will delete all duplicates except for one?

Click through the related questions. I found a bunch of ideas to try a few weeks ago when I was looking for this problem. I mixed and matched a few to get the desired results. — MetalFrog, Sep 18 '12 at 18:39
Do you have any primary key, or other unique constraint, on this table? Or are `user_id` and `business_id` the only columns, such that entire rows are duplicated? — ruakh, Sep 18 '12 at 18:43
Looks like a duplicate of http://stackoverflow.com/questions/672702/how-to-delete-duplicates-in-mysql-table?rq=1 — cptScarlet, Sep 18 '12 at 18:44

score 10 · Answer 1 · edited May 23 '17 at 12:10

Completely identical rows

If you want to avoid completely identical rows, as I understood your question at first, then you can select unique rows to a separate table and recreate the table data from that.

CREATE TEMPORARY TABLE tmp SELECT DISTINCT * FROM business_users;
DELETE FROM business_users;
INSERT INTO business_users SELECT * FROM tmp;
DROP TABLE tmp;

Be careful if there are any foreign key constraints referencing this table, though, as the temporary deletion of rows might lead to cascaded deletions elsewhere.

Introducing a unique constraint

If you only care about pairs of user_id and business_id, you probably want to avoid introducing duplicates in the future. You can move the existing data to a temporary table, add a constraint, and then move the table data back, ignoring duplicates.

CREATE TEMPORARY TABLE tmp SELECT * FROM business_users;
DELETE FROM business_users;
ALTER TABLE business_users ADD UNIQUE (user_id, business_id);
INSERT IGNORE INTO business_users SELECT * FROM tmp;
DROP TABLE tmp;

The above answer is based on this answer. The warning about foreign keys applies just as it did in the section above.

One-shot removal

If you only want to execute a single query, without modifying the table structure in any way, and you have a primary key id identifying each row, then you can try the following:

DELETE FROM business_users WHERE id NOT IN
    (SELECT MIN(id) FROM business_users GROUP BY user_id, business_id);

A similar idea was previously suggested by this answer.

If the above request fails, because you are not allowed to read and delete from a table in the same step, you can again use a temporary table:

CREATE TEMPORARY TABLE tmp
SELECT MIN(id) id FROM business_users GROUP BY user_id, business_id;
DELETE FROM business_users WHERE id NOT IN (SELECT id FROM tmp);
DROP TABLE tmp;

If you want to, you can still introduce a uniqueness constraint after cleaning the data in this fashion. To do so, execute the ALTER TABLE line from the previous section.

I like the last one but i get You can't specify target table 'business_users' for update in FROM clause — Matt Elhotiby, Sep 18 '12 at 19:13
Just out of curiousity, for the one-shot removal, why does the first example have `SELECT MIN(id) FROM` and the second one have `SELECT MIN(id) id FROM` (the second has two 'id's)? — Pete, May 25 '16 at 03:36
@Pete: The `MIN(id) id` in the second is an abbreviation of `MIN(id) AS id`: it specifies the name of the column, so that the column in the resulting table isn't literally named `MIN(id)` which would be quite confusing and hard to type. In the first query, the name of the column does not matter, since the subquery is just used as a set. — MvG, May 25 '16 at 08:02

score 3 · Answer 2 · edited May 23 '17 at 10:32

3

Since you have a primary key, you can use that to pick which rows to keep:

delete from business_users
where id not in (
    select id from (
        select min(id) as id -- Make a list of the primary keys to keep
        from business_users
        group by user_id, business_id -- Group by your duplicated row definition
    ) as a -- Derived table to force an implicit temp table
);

In this way, you won't need to create/drop temp tables and such (except the implicit one).

You might want to put a unique constraint on user_id, business_id so you don't have to worry about this again.

edited May 23 '17 at 10:32

Community

1
1

answered Sep 18 '12 at 19:04

Tim Lehner

14,813
4
59
76

looks great but i get this You can't specify target table 'business_users' for update in FROM clause – Matt Elhotiby Sep 18 '12 at 19:11
@Trace, sorry...I've updated to make a subquery work in mysql in this scenario. – Tim Lehner Sep 18 '12 at 19:26
Note: I've read the same suggestion about using a subquery, but it failed in my own test setup. Seems to be due to the fact that I created `business_users` as a temporary table as well, for testing. In that case, the error is phrased `Can't reopen table: 'business_users'` which amounts to pretty much the same problem (at least in my eyes), but cannot be avoided by introducing yet another subquery. – MvG Sep 18 '12 at 19:46
Interesting. Here is my [test sqlfiddle](http://sqlfiddle.com/#!2/29b20/1). Could you possibly give us a better definition of your existing schema that's throwing the error? Perhaps you will need to put the primary keys you wish to keep into a temp table. – Tim Lehner Sep 18 '12 at 19:51

Delete all the duplicates except one

2 Answers2

Completely identical rows

Introducing a unique constraint

One-shot removal