Using sql to find duplicate records and delete in same operation

Question

I'm using this SQL statement to find duplicate records:

  SELECT id, 
         user_id, 
         activity_type_id, 
         source_id, 
         source_type, 
         COUNT(*) AS cnt
    FROM activities
GROUP BY id, user_id, activity_type_id, source_id, source_type
  HAVING COUNT(*) > 1

However, I want to not only find, but delete in the same operation.

Delete what, exactly? All of them, all but the earliest or latest? — OMG Ponies, May 25 '11 at 03:27
Related: http://stackoverflow.com/questions/6103212/how-do-i-delete-duplicate-rows-and-keep-the-first-row — OMG Ponies, May 25 '11 at 03:37
Is the `id` field really repeated, as your GROUP BY suggests? — pilcrow, May 25 '11 at 04:51

Alex Gitelman · Accepted Answer · 2011-05-25T04:51:37.287

1

delete from activities where id not in (select max(id) from activities group by ....)

Thanks to @OMG Ponies and his other post here is revised solution (but not exactly the same). I assumed here that it does not matter which specific rows are left undeleted. Also the assumption is that id is primary key.

In my example, I just set up one extra column name for testing but it can be easily extended to more columns via GROUP BY clause.

DELETE a FROM activities a 
   LEFT JOIN (SELECT MAX(id) AS id FROM activities GROUP BY name) uniqId 
   ON a.id=uniqId.id WHERE uniqId.id IS NULL;

edited May 25 '11 at 04:51

answered May 25 '11 at 03:29

Alex Gitelman

24,429
7
52
49

If the OP is really GROUPing BY `id` as the SELECT suggests, then he has duplicated ids, and this won't remove them. – pilcrow May 25 '11 at 03:34
error #1093 - `You can't specify target table 'activities' for update in FROM clause` – OMG Ponies May 25 '11 at 03:37
@OMG sorry - I forgot how cranky mysql is. I will play with it and see what comes out. May be some temp table? – Alex Gitelman May 25 '11 at 03:53
@pilcrow If rows are identical in all columns the best shot is to craft something with rownum. But my guess is that id is PK in this case. – Alex Gitelman May 25 '11 at 04:01
@OMG Ponies - revised. Slightly different than your solution but either one is good. – Alex Gitelman May 25 '11 at 04:52
@Alex - assuming `id` is not unique, how would you emulate rownum/rowid in MySQL, and how would you further DELETE using it? – pilcrow May 25 '11 at 05:03
@pilcrow if rows have no PK and are exactly identical, how would you delete all of them but one? The only way I can see is to find out specific values in duplicated rows and delete with `where [all values] and rownum>1`. But I don't see it being possible in one sql. – Alex Gitelman May 25 '11 at 05:15
@Alex, right, it is not possible in one query without some kind of unique row identifier. – pilcrow May 25 '11 at 05:20

Using sql to find duplicate records and delete in same operation

1 Answers1