0

I have a database table (test_table) with an auto-increment column (id) and I want to delete all double (or more) entries when 3 specific columns (A, B, C) have identical entries.

    id    column_A    column_B    column_C
------------------------------------------------
    1     ooo         aaa         uuu
    2     ooo         aaa         uuu
    3     ttt         ppp         uuu
    4     ooo         aaa         uuu
    5     iii         kkk         ccc

In this example, the rows with id 2 and 4 should be deleted after executing the DELETE query.

Regards.

potashin
  • 44,205
  • 11
  • 83
  • 107
lickmycode
  • 2,069
  • 2
  • 19
  • 20
  • Are you saying that the first response is always the correct one that should remain? If this is always the case, can you make column a your "primary key" which won't allow dupes by nature? Not a fix but a best practice... – Silvertiger Jun 08 '14 at 17:35
  • @Silvertiger: I'm cleaning up an existing database actually. It is not important which auto-increment key left, important is only that the doubled entries are beeing removed and one row of them stays. – lickmycode Jun 08 '14 at 17:39

4 Answers4

3
DELETE FROM tbl
WHERE `id` NOT IN ( SELECT * FROM (
                                SELECT MIN(`id`) 
                                FROM tbl
                                GROUP BY `column_A`
                                       , `column_B`
                                       , `column_C` 
                              ) x
              )

SQLFiddle

potashin
  • 44,205
  • 11
  • 83
  • 107
  • 1
    @lickmycode : because you can't use permanent tables in WHERE clause in the `UPDATE`/`INSERT`/`DELETE` queries, so `x` here is to represent it as a temporary table. – potashin Jun 08 '14 at 18:10
1

You can use a self join in delete query, subquery in join will join the rows which are duplicate and condition t.id > tt.id will keep minimum records with id and other duplicates will be deleted

delete t.* from t
join (
select * from t 
group by `column_A`, `column_B`, `column_C`
having count(*) > 1
) tt 
  on(t.`column_A` = tt.`column_A`
          AND t.`column_B` = tt.`column_B`
          AND t.`column_C` = tt.`column_C` 
         AND t.id > tt.id)

I have added one more copy of 5th row to test it

Demo

M Khalid Junaid
  • 63,861
  • 10
  • 90
  • 118
  • 1
    Thank you for putting fiddle together. – potashin Jun 08 '14 at 17:57
  • With this query I am getting an SQL Error `#1205 - Lock wait timeout exceeded; try restarting transaction`. Using InnoDB Tables. Does this query takes more resources than the query provided by Notulysses? – lickmycode Jun 08 '14 at 18:27
0

As a quick possibility I borrowed this from https://www.linkedin.com/groups/How-delete-duplicate-rows-in-78638.S.218942398 :

Step 1: Move the non duplicates (unique tuples) into a temporary table

CREATE TABLE new_table AS
SELECT * FROM old_table WHERE 1 GROUP BY [COLUMN TO remove duplicates BY];

Step 2: delete the old table

DROP TABLE old_table;

Step 3: rename the new_table to the name of the old_table

RENAME TABLE new_table TO old_table;

Then you can edit the new table to have column A be the primary key where no further dupes will occur.

Silvertiger
  • 1,680
  • 2
  • 19
  • 32
  • It looks like that Richa Tayal person in that LinkedIn link actually got her answer from http://stackoverflow.com/a/7582706/456814. –  Sep 18 '15 at 10:59
0
DELETE x
  FROM t x
  LEFT
  JOIN
     ( SELECT column_a
            , column_b
            , column_c
            , MIN(id) min_id
         FROM t
        GROUP 
           BY column_a
            , column_b
            , column_c
     ) y
    ON y.column_a = x.column_a
   AND y.column_b = x.column_b
   AND y.column_c = x.column_c
   AND y.min_id = x.id
 WHERE y.min_id IS NULL 

http://www.sqlfiddle.com/#!2/b1b679/1

Strawberry
  • 33,750
  • 13
  • 40
  • 57