23

I read all the relevant duplicated questions/answers and I found this to be the most relevant answer:

INSERT IGNORE INTO temp(MAILING_ID,REPORT_ID) 
SELECT DISTINCT MAILING_ID,REPORT_IDFROM table_1
;

The problem is that I want to remove duplicates by col1 and col2, but also want to include to the insert all the other fields of table_1.

I tried to add all the relevant columns this way:

INSERT IGNORE INTO temp(M_ID,MAILING_ID,REPORT_ID,
MAILING_NAME,VISIBILITY,EXPORTED) SELECT DISTINCT  
M_ID,MAILING_ID,REPORT_ID,MAILING_NAME,VISIBILITY,
EXPORTED FROM table_1
;


M_ID(int,primary),MAILING_ID(int),REPORT_ID(int),
MAILING_NAME(varchar),VISIBILITY(varchar),EXPORTED(int)

But it inserted all rows into temp (including duplicates)

bonCodigo
  • 14,268
  • 1
  • 48
  • 91
user838437
  • 1,451
  • 7
  • 23
  • 31

7 Answers7

39

The best way to delete duplicate rows by multiple columns is the simplest one:

Add an UNIQUE index:

ALTER IGNORE TABLE your_table ADD UNIQUE (field1,field2,field3);

The IGNORE above makes sure that only the first found row is kept, the rest discarded.

(You can then drop that index if you need future duplicates and/or know they won't happen again).

guidod
  • 996
  • 9
  • 13
  • MUCH easier than correlated subqueries. – larrylampco Apr 26 '17 at 03:17
  • 13
    As of MySQL 5.7.4, the IGNORE clause for ALTER TABLE is removed and its use produces an error. – ianaz Sep 20 '17 at 17:56
  • 2
    in mysql 5.5 there is a bug that can be present. use `set old_alter_table=1` see docs at: https://dev.mysql.com/doc/refman/5.5/en/alter-table.html Due to a bug related to Fast Index Creation (Bug #40344), ALTER IGNORE TABLE ... ADD UNIQUE INDEX does not delete duplicate rows. The IGNORE keyword is ignored. If any duplicate rows exist, the operation fails with a Duplicate entry error. A workaround is to set old_alter_table=1 prior to running an ALTER IGNORE TABLE ... ADD UNIQUE INDEX statement. – DanJGer Dec 04 '17 at 17:47
  • How would this works if I want to modify one column first. For example this is not working: `ALTER IGNORE TABLE mytable ADD UNIQUE (FROM_UNIXTIME(CEIL(UNIX_TIMESTAMP(timestamp) / 5) * 5), id2)` – freddy888 Feb 01 '19 at 15:53
  • 3
    ALTER IGNORE has been deprecated – Derek Gogol Jul 30 '20 at 17:47
24

This works perfectly in any version of MySQL including 5.7+. It also handles the error You can't specify target table 'my_table' for update in FROM clause by using a double-nested subquery. It only deletes ONE duplicate row (the later one) so if you have 3 or more duplicates, you can run the query multiple times. It never deletes unique rows.

DELETE FROM my_table
WHERE id IN (
  SELECT calc_id FROM (
    SELECT MAX(id) AS calc_id
    FROM my_table
    GROUP BY identField1, identField2
    HAVING COUNT(id) > 1
  ) temp
)

I needed this query because I wanted to add a UNIQUE index on two columns but there were some duplicate rows that I needed to discard first.

LStarky
  • 2,740
  • 1
  • 17
  • 47
15

For Mysql:

DELETE t1 FROM yourtable t1 
  INNER JOIN yourtable t2 WHERE t1.id < t2.id 
    AND t1.identField1 = t2.identField1 
    AND t1.identField2 = t2.identField2;
stevex
  • 5,589
  • 37
  • 52
6

You will first need to find your duplicates by grouping on the two fields with a having clause.

    Select identField1, identField2, count(*) FROM yourTable
        GROUP BY identField1, identField2
          HAVING count(*) >1

If this returns what you want, you can then use it as a subquery and

  DELETE FROM yourTable WHERE field in (Select identField1, identField2, count(*) FROM yourTable
        GROUP BY identField1, identField2
          HAVING count(*) >1 )
Scotch
  • 3,186
  • 11
  • 35
  • 50
  • Will this keep one of the duplicates rows? (I want to keep one, not delete any row that has a duplicate) – user838437 Jan 15 '13 at 15:57
  • It will remove all of the duplicates. If you want to keep one, you can select a max or min of a field you aren't aggregating on. A quick google turned up http://stackoverflow.com/questions/3777633/delete-duplicate-rows-dont-delete-all-duplicate?rq=1 which also links to other identical questions. – Scotch Jan 15 '13 at 16:04
  • 2
    What if the table only has 2 columns and both columns are being grouped, how do I prevent deleting all duplicates? – CMCDragonkai Feb 05 '16 at 12:31
1

NOTE: This solution is an alternative & old school solution.


If you couldn't achieve what you wanted, then you can try my "oldschool" method:

First, run this query to get the duplicate records:

select   column1,
         column2,
         count(*)
from     table
group by column1,
         column2
having   count(*) > 1
order by count(*) desc

After that, select those results and paste them into the notepad++:

select query paste onto notepad

Now by using the find and replace specialty of the notepad++ replace them with; first "delete" then "insert" queries like this (from now on, for security reasons, my values will be AAAA).

Special Note: Please make another new line for the end of the last line of your data inside notepad++ because regex matched the '\r\n' at the end of the each line:

enter image description here

Find what regex: \D*(\d+)\D*(\d+)\D*\r\n

Replace with string: delete from table where column1 = $1 and column2 = $2; insert into table set column1 = $1, column2 = $2;\r\n

Now finally, paste those queries to your MySQL Workbench's query console and execute. You will see only one occurrences of each duplicate record.

enter image description here

This answer is for a relation table constructed of just two columns without ID. I think you can apply it to your situation.

Bahadir Tasdemir
  • 10,325
  • 4
  • 49
  • 61
1

you can always get the primary ids by grouping that two unique fields

select count(*), id as count from table group by col a, col b having count(*)>1;

and then

delete from table where id in ( select count(*), id as count from table group by col a, col b having count(*)>1) limit maxlimit;

you can also use max() in place of limit

1

In a large data set if you are selecting the multiple columns in the select clause ex: select x,y,z from table1. And the requirement is to remove duplicate based on two columns:from above example let y,z then you may use below instead of using combo of "group by" and "sub query", which is bad in performance:

select x,y,z 
from (
select x,y,z , row_number() over (partition by y,z) as index_num
from table1) main
where main.index_num=1
Dharman
  • 30,962
  • 25
  • 85
  • 135
Govind
  • 11
  • 1