9

I made a mistake and I have unwanted duplicates.

I have a table with 4 key fields. A1, k1, k2, k3.

A1 is auto increment and the primary key.

the combination of k1, k2 and k3 is supposed to be unique and I have to delete the duplicate rows before I create a unique index. Some rows have one duplicate, some have many.

SELECT CONCAT(k1, k2, k) AS dup_value
  FROM myviews
 GROUP BY dup_value
HAVING (COUNT(dup_value) > 1)

shows me duplicates values that I need to deal with. But now I don't know how to keep one and delete the rest of each duplicate set.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
sdfor
  • 6,324
  • 13
  • 51
  • 61
  • 2
    There are a LOT of questions regarding this problem. This should be closed as duplicate - FAST. Do a search for "delete duplicate rows." Here's [the first result](http://stackoverflow.com/questions/3777633/delete-duplicate-rows-dont-delete-all-duplicate) for me. – rockerest May 23 '11 at 21:30
  • 1
    @rockerest Interestingly I tried the solution you linked to first and it doesn't work with MySQL. The answer from OMG Ponies did work for me on MySQL so that's where my +1 goes. – jsleuth Aug 13 '13 at 18:36

4 Answers4

16

Backup your data, then...

MySQL supports JOINs in DELETE statements. If you want to keep the first of the duplicates:

DELETE a
  FROM MYVIEWS a
  JOIN (SELECT MIN(t.a1) AS min_a1, t.k1, t.k2, t.k3
          FROM MYVIEWS t
      GROUP BY t.k1, t.k2, t.k3
        HAVING COUNT(*) > 1) b ON b.k1 = a.k1
                              AND b.k2 = a.k2
                              AND b.k3 = a.k3
                              AND b.min_a1 != a.a1

If you want to keep the last of the duplicates:

DELETE a
  FROM MYVIEWS a
  JOIN (SELECT MAX(t.a1) AS max_a1, t.k1, t.k2, t.k3
          FROM MYVIEWS t
      GROUP BY t.k1, t.k2, t.k3
        HAVING COUNT(*) > 1) b ON b.k1 = a.k1
                              AND b.k2 = a.k2
                              AND b.k3 = a.k3
                              AND b.max_a1 != a.a1
Community
  • 1
  • 1
OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
2

You can create a new table with the same structure but empty, then create the unique key on it, then do a INSERT IGNORE / SELECT * FROM the original table into the new table, then delete the original table.

INSERT IGNORE will automatically ignore any primary or unique key issues and just skip the duplicates.

spanky
  • 1,479
  • 2
  • 11
  • 22
  • Not an ideal practice for Production systems -- queries will be depending on the old table, and there can't be more than one table in the database with the same name. This would require an outage. – OMG Ponies May 23 '11 at 21:45
  • Very true. I assumed OP's wasn't in production yet, due to altering keys and deleting tons of rows, but I shouldn't assume! – spanky May 23 '11 at 21:59
2

Someting like this?

DELETE FROM myviews WHERE EXISTS(SELECT CONCAT(k1, k2, k) AS dup_value
FROM myviews
GROUP BY dup_value
HAVING (COUNT(dup_value) > 1));
Automatico
  • 12,420
  • 9
  • 82
  • 110
  • Or maybe I misunderstood your question. Is it duplicity within one tuple? – Automatico May 23 '11 at 21:35
  • this is wrong (at least relative to the question) becouse it deletes all rows on duplicated sets – Saic Siquot Aug 26 '11 at 13:39
  • In addition to deleting all rows instead of all but one (as @LuisSiquot mentions) this will cause an error when run on tables: SQL Error (1093): You can't specify target table 'mytable' for update in FROM clause – ebyrob Nov 10 '16 at 16:52
0

You need a separator in your concat function, because otherwise "a", "b", and "cd" is the same as "abcd", "", "".

Nick ODell
  • 15,465
  • 3
  • 32
  • 66