We have a table business_users
with a user_id
and business_id
and we have duplicates.
How can I write a query that will delete all duplicates except for one?

- 57,380
- 22
- 148
- 276

- 43,028
- 85
- 218
- 321
-
1Click through the related questions. I found a bunch of ideas to try a few weeks ago when I was looking for this problem. I mixed and matched a few to get the desired results. – MetalFrog Sep 18 '12 at 18:39
-
2Do you have any primary key, or other unique constraint, on this table? Or are `user_id` and `business_id` the only columns, such that entire rows are duplicated? – ruakh Sep 18 '12 at 18:43
-
Looks like a duplicate of http://stackoverflow.com/questions/672702/how-to-delete-duplicates-in-mysql-table?rq=1 – cptScarlet Sep 18 '12 at 18:44
-
yes there is a `id` as the primary key – Matt Elhotiby Sep 18 '12 at 18:52
2 Answers
Completely identical rows
If you want to avoid completely identical rows, as I understood your question at first, then you can select unique rows to a separate table and recreate the table data from that.
CREATE TEMPORARY TABLE tmp SELECT DISTINCT * FROM business_users;
DELETE FROM business_users;
INSERT INTO business_users SELECT * FROM tmp;
DROP TABLE tmp;
Be careful if there are any foreign key constraints referencing this table, though, as the temporary deletion of rows might lead to cascaded deletions elsewhere.
Introducing a unique constraint
If you only care about pairs of user_id
and business_id
, you probably want to avoid introducing duplicates in the future. You can move the existing data to a temporary table, add a constraint, and then move the table data back, ignoring duplicates.
CREATE TEMPORARY TABLE tmp SELECT * FROM business_users;
DELETE FROM business_users;
ALTER TABLE business_users ADD UNIQUE (user_id, business_id);
INSERT IGNORE INTO business_users SELECT * FROM tmp;
DROP TABLE tmp;
The above answer is based on this answer. The warning about foreign keys applies just as it did in the section above.
One-shot removal
If you only want to execute a single query, without modifying the table structure in any way, and you have a primary key id
identifying each row, then you can try the following:
DELETE FROM business_users WHERE id NOT IN
(SELECT MIN(id) FROM business_users GROUP BY user_id, business_id);
A similar idea was previously suggested by this answer.
If the above request fails, because you are not allowed to read and delete from a table in the same step, you can again use a temporary table:
CREATE TEMPORARY TABLE tmp
SELECT MIN(id) id FROM business_users GROUP BY user_id, business_id;
DELETE FROM business_users WHERE id NOT IN (SELECT id FROM tmp);
DROP TABLE tmp;
If you want to, you can still introduce a uniqueness constraint after cleaning the data in this fashion. To do so, execute the ALTER TABLE
line from the previous section.
-
I like the last one but i get You can't specify target table 'business_users' for update in FROM clause – Matt Elhotiby Sep 18 '12 at 19:13
-
-
Just out of curiousity, for the one-shot removal, why does the first example have `SELECT MIN(id) FROM` and the second one have `SELECT MIN(id) id FROM` (the second has two 'id's)? – Pete May 25 '16 at 03:36
-
2@Pete: The `MIN(id) id` in the second is an abbreviation of `MIN(id) AS id`: it specifies the name of the column, so that the column in the resulting table isn't literally named `MIN(id)` which would be quite confusing and hard to type. In the first query, the name of the column does not matter, since the subquery is just used as a set. – MvG May 25 '16 at 08:02
Since you have a primary key, you can use that to pick which rows to keep:
delete from business_users
where id not in (
select id from (
select min(id) as id -- Make a list of the primary keys to keep
from business_users
group by user_id, business_id -- Group by your duplicated row definition
) as a -- Derived table to force an implicit temp table
);
In this way, you won't need to create/drop temp tables and such (except the implicit one).
You might want to put a unique constraint on user_id, business_id
so you don't have to worry about this again.

- 1
- 1

- 14,813
- 4
- 59
- 76
-
looks great but i get this You can't specify target table 'business_users' for update in FROM clause – Matt Elhotiby Sep 18 '12 at 19:11
-
@Trace, sorry...I've updated to make a subquery work in mysql in this scenario. – Tim Lehner Sep 18 '12 at 19:26
-
Note: I've read the same suggestion about using a subquery, but it failed in my own test setup. Seems to be due to the fact that I created `business_users` as a temporary table as well, for testing. In that case, the error is phrased `Can't reopen table: 'business_users'` which amounts to pretty much the same problem (at least in my eyes), but cannot be avoided by introducing yet another subquery. – MvG Sep 18 '12 at 19:46
-
Interesting. Here is my [test sqlfiddle](http://sqlfiddle.com/#!2/29b20/1). Could you possibly give us a better definition of your existing schema that's throwing the error? Perhaps you will need to put the primary keys you wish to keep into a temp table. – Tim Lehner Sep 18 '12 at 19:51