Delete double rows when columns have identical entries

Question

I have a database table (test_table) with an auto-increment column (id) and I want to delete all double (or more) entries when 3 specific columns (A, B, C) have identical entries.

    id    column_A    column_B    column_C
------------------------------------------------
    1     ooo         aaa         uuu
    2     ooo         aaa         uuu
    3     ttt         ppp         uuu
    4     ooo         aaa         uuu
    5     iii         kkk         ccc

In this example, the rows with id 2 and 4 should be deleted after executing the DELETE query.

Regards.

Are you saying that the first response is always the correct one that should remain? If this is always the case, can you make column a your "primary key" which won't allow dupes by nature? Not a fix but a best practice... — Silvertiger, Jun 08 '14 at 17:35
@Silvertiger: I'm cleaning up an existing database actually. It is not important which auto-increment key left, important is only that the doubled entries are beeing removed and one row of them stays. — lickmycode, Jun 08 '14 at 17:39

potashin · Accepted Answer · 2014-06-08T17:54:38.393

3

DELETE FROM tbl
WHERE `id` NOT IN ( SELECT * FROM (
                                SELECT MIN(`id`) 
                                FROM tbl
                                GROUP BY `column_A`
                                       , `column_B`
                                       , `column_C` 
                              ) x
              )

SQLFiddle

edited Jun 08 '14 at 17:54

answered Jun 08 '14 at 17:43

potashin

44,205
11
83
107

1

@lickmycode : because you can't use permanent tables in WHERE clause in the `UPDATE`/`INSERT`/`DELETE` queries, so `x` here is to represent it as a temporary table. – potashin Jun 08 '14 at 18:10

score 1 · Answer 2 · answered Jun 08 '14 at 17:48

1

You can use a self join in delete query, subquery in join will join the rows which are duplicate and condition t.id > tt.id will keep minimum records with id and other duplicates will be deleted

delete t.* from t
join (
select * from t 
group by `column_A`, `column_B`, `column_C`
having count(*) > 1
) tt 
  on(t.`column_A` = tt.`column_A`
          AND t.`column_B` = tt.`column_B`
          AND t.`column_C` = tt.`column_C` 
         AND t.id > tt.id)

I have added one more copy of 5th row to test it

Demo

answered Jun 08 '14 at 17:48

M Khalid Junaid

63,861
10
90
118

1

Thank you for putting fiddle together. – potashin Jun 08 '14 at 17:57
With this query I am getting an SQL Error `#1205 - Lock wait timeout exceeded; try restarting transaction`. Using InnoDB Tables. Does this query takes more resources than the query provided by Notulysses? – lickmycode Jun 08 '14 at 18:27

score 0 · Answer 3 · edited Sep 18 '15 at 10:56

0

As a quick possibility I borrowed this from https://www.linkedin.com/groups/How-delete-duplicate-rows-in-78638.S.218942398 :

Step 1: Move the non duplicates (unique tuples) into a temporary table
CREATE TABLE new_table AS
SELECT * FROM old_table WHERE 1 GROUP BY [COLUMN TO remove duplicates BY];
Step 2: delete the old table
DROP TABLE old_table;
Step 3: rename the new_table to the name of the old_table
RENAME TABLE new_table TO old_table;

Then you can edit the new table to have column A be the primary key where no further dupes will occur.

edited Sep 18 '15 at 10:56

answered Jun 08 '14 at 17:40

Silvertiger

1,680
2
19
32

It looks like that Richa Tayal person in that LinkedIn link actually got her answer from http://stackoverflow.com/a/7582706/456814. – Sep 18 '15 at 10:59

score 0 · Answer 4 · answered Jun 08 '14 at 18:06

DELETE x
  FROM t x
  LEFT
  JOIN
     ( SELECT column_a
            , column_b
            , column_c
            , MIN(id) min_id
         FROM t
        GROUP 
           BY column_a
            , column_b
            , column_c
     ) y
    ON y.column_a = x.column_a
   AND y.column_b = x.column_b
   AND y.column_c = x.column_c
   AND y.min_id = x.id
 WHERE y.min_id IS NULL

http://www.sqlfiddle.com/#!2/b1b679/1

Delete double rows when columns have identical entries

4 Answers4

Demo