1

Due to a recent bug, we have a table with multiple duplicate entries.
What I want to do is to find and ideally delete (or perhaps just update) the duplicate rows.

PersonGroup
-----------
id
personId
groupId
type
primary
value

select count(*) cnt from personGroup pg where type="FOO" group by personId having cnt > 1;

yields nearly 20k rows. There should be 0. Each personId should have only one entry for any given type.

I can write a program to fix this scenario but before I do that I'm wondering if there is a purely SQL solution.

kasdega
  • 18,396
  • 12
  • 45
  • 89
  • What version of MySQL? – Shawn Dec 12 '18 at 22:16
  • https://stackoverflow.com/questions/3311903/remove-duplicate-rows-in-mysql – Shawn Dec 12 '18 at 22:27
  • 1
    This is absolutely NOT a duplicate of that other stack question...that question should be titled "AVOID duplicate rows in Mysql" This is a question about how to remove them once we have them. – kasdega Dec 13 '18 at 02:14
  • @Nick please read this question and the other referenced question closer, this is not a duplicate. The accepted answer in the other stack refers to setting an unique index which won't help here. – kasdega Dec 13 '18 at 02:15

1 Answers1

1

Check this query. I think it is pretty simple and yet effective:

delete from persongroup
 where id not in (
    select max(id)
      from persongroup
     group by PersonId);

if your table is too big then you can consider to write this with inner join

 delete persongroup
   from persongroup
  inner join (
     select max(id) as lastId, personId
       from personGroup
      group by personId
     having count(*) > 1) dup on dup.personId = persongroup.personId
  where persongroup.id < dup.lastId;

above query is not tested

Derviş Kayımbaşıoğlu
  • 28,492
  • 4
  • 50
  • 72