Find and remove duplicate rows by two columns

Question

I read all the relevant duplicated questions/answers and I found this to be the most relevant answer:

INSERT IGNORE INTO temp(MAILING_ID,REPORT_ID) 
SELECT DISTINCT MAILING_ID,REPORT_IDFROM table_1
;

The problem is that I want to remove duplicates by col1 and col2, but also want to include to the insert all the other fields of table_1.

I tried to add all the relevant columns this way:

INSERT IGNORE INTO temp(M_ID,MAILING_ID,REPORT_ID,
MAILING_NAME,VISIBILITY,EXPORTED) SELECT DISTINCT  
M_ID,MAILING_ID,REPORT_ID,MAILING_NAME,VISIBILITY,
EXPORTED FROM table_1
;


M_ID(int,primary),MAILING_ID(int),REPORT_ID(int),
MAILING_NAME(varchar),VISIBILITY(varchar),EXPORTED(int)

But it inserted all rows into temp (including duplicates)

Well for one thing -- do not use `INSERT IGNORE` in your case, 2nd --> How is your db table set up? — Naftali, Jan 15 '13 at 15:19
@Neal updated my question with the actual field names and types — user838437, Jan 15 '13 at 15:44

score 39 · Answer 1 · answered Mar 13 '15 at 20:20

39

The best way to delete duplicate rows by multiple columns is the simplest one:

Add an UNIQUE index:

ALTER IGNORE TABLE your_table ADD UNIQUE (field1,field2,field3);

The IGNORE above makes sure that only the first found row is kept, the rest discarded.

(You can then drop that index if you need future duplicates and/or know they won't happen again).

answered Mar 13 '15 at 20:20

guidod

996
9
13

MUCH easier than correlated subqueries. – larrylampco Apr 26 '17 at 03:17
13

As of MySQL 5.7.4, the IGNORE clause for ALTER TABLE is removed and its use produces an error. – ianaz Sep 20 '17 at 17:56
2

in mysql 5.5 there is a bug that can be present. use `set old_alter_table=1` see docs at: https://dev.mysql.com/doc/refman/5.5/en/alter-table.html Due to a bug related to Fast Index Creation (Bug #40344), ALTER IGNORE TABLE ... ADD UNIQUE INDEX does not delete duplicate rows. The IGNORE keyword is ignored. If any duplicate rows exist, the operation fails with a Duplicate entry error. A workaround is to set old_alter_table=1 prior to running an ALTER IGNORE TABLE ... ADD UNIQUE INDEX statement. – DanJGer Dec 04 '17 at 17:47
How would this works if I want to modify one column first. For example this is not working: `ALTER IGNORE TABLE mytable ADD UNIQUE (FROM_UNIXTIME(CEIL(UNIX_TIMESTAMP(timestamp) / 5) * 5), id2)` – freddy888 Feb 01 '19 at 15:53
3

ALTER IGNORE has been deprecated – Derek Gogol Jul 30 '20 at 17:47

score 24 · Answer 2 · answered Jan 30 '19 at 18:40

24

This works perfectly in any version of MySQL including 5.7+. It also handles the error You can't specify target table 'my_table' for update in FROM clause by using a double-nested subquery. It only deletes ONE duplicate row (the later one) so if you have 3 or more duplicates, you can run the query multiple times. It never deletes unique rows.

DELETE FROM my_table
WHERE id IN (
  SELECT calc_id FROM (
    SELECT MAX(id) AS calc_id
    FROM my_table
    GROUP BY identField1, identField2
    HAVING COUNT(id) > 1
  ) temp
)

I needed this query because I wanted to add a UNIQUE index on two columns but there were some duplicate rows that I needed to discard first.

answered Jan 30 '19 at 18:40

LStarky

2,740
1
17
47

`You can't specify target table 'table' for update in FROM clause` – Umair Ayub Jun 28 '19 at 10:47
1

It works since the `WHERE` clause uses double nesting. That's the magic that tricks the MySQL engine into allowing this query without creating a conflict. – LStarky Jun 29 '19 at 21:42
2

keep in mind it will remove ONLY 1 duplicate if 1+ exists. – Tadas V. Oct 27 '22 at 02:03

score 15 · Answer 3 · edited Feb 25 '20 at 20:02

15

For Mysql:

DELETE t1 FROM yourtable t1 
  INNER JOIN yourtable t2 WHERE t1.id < t2.id 
    AND t1.identField1 = t2.identField1 
    AND t1.identField2 = t2.identField2;

edited Feb 25 '20 at 20:02

stevex

5,589
37
52

answered Jul 27 '18 at 06:50

Shashikant Sharma

496
5
10

score 6 · Answer 4 · answered Jan 15 '13 at 15:51

6

You will first need to find your duplicates by grouping on the two fields with a having clause.

    Select identField1, identField2, count(*) FROM yourTable
        GROUP BY identField1, identField2
          HAVING count(*) >1

If this returns what you want, you can then use it as a subquery and

  DELETE FROM yourTable WHERE field in (Select identField1, identField2, count(*) FROM yourTable
        GROUP BY identField1, identField2
          HAVING count(*) >1 )

answered Jan 15 '13 at 15:51

Scotch

3,186
11
35
50

Will this keep one of the duplicates rows? (I want to keep one, not delete any row that has a duplicate) – user838437 Jan 15 '13 at 15:57
It will remove all of the duplicates. If you want to keep one, you can select a max or min of a field you aren't aggregating on. A quick google turned up http://stackoverflow.com/questions/3777633/delete-duplicate-rows-dont-delete-all-duplicate?rq=1 which also links to other identical questions. – Scotch Jan 15 '13 at 16:04
2

What if the table only has 2 columns and both columns are being grouped, how do I prevent deleting all duplicates? – CMCDragonkai Feb 05 '16 at 12:31

Bahadir Tasdemir · Answer 5 · 2019-02-04T08:48:58.053

NOTE: This solution is an alternative & old school solution.

If you couldn't achieve what you wanted, then you can try my "oldschool" method:

First, run this query to get the duplicate records:

select   column1,
         column2,
         count(*)
from     table
group by column1,
         column2
having   count(*) > 1
order by count(*) desc

After that, select those results and paste them into the notepad++:

Now by using the find and replace specialty of the notepad++ replace them with; first "delete" then "insert" queries like this (from now on, for security reasons, my values will be AAAA).

Special Note: Please make another new line for the end of the last line of your data inside notepad++ because regex matched the '\r\n' at the end of the each line:

Find what regex: \D*(\d+)\D*(\d+)\D*\r\n

Replace with string: delete from table where column1 = $1 and column2 = $2; insert into table set column1 = $1, column2 = $2;\r\n

Now finally, paste those queries to your MySQL Workbench's query console and execute. You will see only one occurrences of each duplicate record.

This answer is for a relation table constructed of just two columns without ID. I think you can apply it to your situation.

Sudhanshu Jain · Answer 6 · 2017-05-30T09:34:08.020

1

you can always get the primary ids by grouping that two unique fields

select count(*), id as count from table group by col a, col b having count(*)>1;

and then

delete from table where id in ( select count(*), id as count from table group by col a, col b having count(*)>1) limit maxlimit;

you can also use max() in place of limit

edited May 30 '17 at 09:34

answered May 30 '17 at 09:26

Sudhanshu Jain

11
2

what does the limit maxlimit do? – Miguel Stevens Nov 30 '17 at 12:16
1

@Notflip that refers to how many duplicate rows you want to delete – Sudhanshu Jain Dec 12 '17 at 11:59
you cannot use the same table for the nested query and the delete query. – didil Mar 17 '18 at 11:01
This is not working @SudhanshuJain, have you tested this?? – Prabhu Nandan Kumar Dec 22 '21 at 12:05

score 1 · Answer 7 · edited Jan 09 '20 at 23:02

In a large data set if you are selecting the multiple columns in the select clause ex: select x,y,z from table1. And the requirement is to remove duplicate based on two columns:from above example let y,z then you may use below instead of using combo of "group by" and "sub query", which is bad in performance:

select x,y,z 
from (
select x,y,z , row_number() over (partition by y,z) as index_num
from table1) main
where main.index_num=1

Find and remove duplicate rows by two columns

7 Answers7

Linked