172

I need to DELETE duplicated rows for specified sid on a MySQL table.

How can I do this with an SQL query?

DELETE (DUPLICATED TITLES) FROM table WHERE SID = "1"

Something like this, but I don't know how to do it.

DineshDB
  • 5,998
  • 7
  • 33
  • 49
Ali Demirci
  • 5,302
  • 7
  • 39
  • 66
  • Do you only need to do it once or do you need to do it all the time? – Billy ONeal Apr 13 '10 at 14:44
  • Do the records with the duplicate records all have the same data, or are the rest of the fields different from each other? If you have the first option you can simply delete all the records but one, if you have the second option, how are you determining which record you want to keep? – user254875486 Apr 13 '10 at 14:46
  • @Lex First option. @Billy I need to do it all the time. – Ali Demirci Apr 13 '10 at 14:55
  • I found that this answer worked better: http://stackoverflow.com/questions/4685173/delete-all-duplicate-rows-except-for-one-in-mysql – Tali Luvhengo Feb 14 '13 at 10:20
  • 1
    possible duplicate of [Remove duplicate rows in MySQL](http://stackoverflow.com/questions/3311903/remove-duplicate-rows-in-mysql) – Basilevs Sep 16 '14 at 16:16
  • 1
    There are a lot of things that have changed here in various versions of MySQL. Check your MySQL version carefully before leaping down the path of any of the solutions here. – delatbabel Jul 05 '17 at 02:33

25 Answers25

222

This removes duplicates in place, without making a new table.

ALTER IGNORE TABLE `table_name` ADD UNIQUE (title, SID)

Note: This only works well if index fits in memory.

TIBI4
  • 50
  • 6
user187291
  • 53,363
  • 19
  • 95
  • 127
  • 28
    Notice: this would keep the oldest duplicate record and would erase the newer ones. If you want to keep the newest you cannot do this with `ALTER IGNORE`. – Haralan Dobrev Oct 01 '12 at 10:26
  • 10
    Doesn't seem to work with InnoDB. I ran `ALTER TABLE foo ENGINE MyISAM` to work around it, changed the engine back after. – Martin Jul 17 '13 at 23:06
  • 13
    this might fail on MySQL > 5.5, if so use "set session old_alter_table=1;" and "set session old_alter_table=0;" before and after the statement – chillitom Sep 26 '13 at 10:52
  • my db has duplicate row exactly, schema is id,title and both are repeated, now what will be query, ALTER IGNORE TABLE foobar ADD UNIQUE (title, id) is not working and not even only (title) or only (id) – Wasim A. Jul 11 '14 at 17:23
  • 31
    [As of MySQL 5.7.4, the IGNORE clause for ALTER TABLE is removed and its use produces an error.](http://dev.mysql.com/doc/refman/5.7/en/alter-table.html) – Geoffrey Booth Jan 10 '16 at 22:04
  • 2
    This worked very well for me on MySQL 5.0.67, and I appreciate the nice one-liner. @GeoffreyBooth, I wonder why they removed this behavior in v5.7.4. Any ideas? – StockB Jan 20 '17 at 18:37
  • 2
    As noted by others this is deprecated in versions of MySQL >= 5.7. See here: https://dev.mysql.com/worklog/task/?id=7395 I have no idea why it's deprecated. – delatbabel Jul 05 '17 at 02:36
  • 2
    @delatbabel The reason for deprecating it is given in the page you linked to. – Barmar Aug 09 '17 at 14:51
  • Note: this won't work for removing duplicates if any of the columns are NULL (UNIQUE constraint doesn't view two NULL values as a collision). – ValAyal Nov 22 '17 at 23:24
  • Didn't work for me but produced an error instead (MariaDB 9) – spyro Mar 31 '21 at 07:27
  • Why removed: "*This feature is badly defined (what is the first row?), causes problems for replication, disables online alter for unique index creation and has caused problems with foreign keys (rows removed in parent table).*". Source: link above – Alcalyn Feb 01 '23 at 16:52
145

Suppose you have a table employee, with the following columns:

employee (first_name, last_name, start_date)

In order to delete the rows with a duplicate first_name column:

delete
from employee using employee,
    employee e1
where employee.id > e1.id
    and employee.first_name = e1.first_name  
Shelvacu
  • 4,245
  • 25
  • 44
Abhijoy_D
  • 1,603
  • 1
  • 14
  • 9
  • 1
    The remaining record will have the maximum or minimum id in its duplicating group? – Frozen Flame Mar 11 '16 at 06:50
  • 1
    The remaining record will have the minimum id since it is the only one not meeting the condition to be deleted – Pablo Guerrero Jun 17 '16 at 16:12
  • 1
    Seems like joining `employee` against itself for one index match and one `>` check on an index is going to be slow for large tables. Wouldn't it be better to `SELECT MAX(ID) FROM t GROUP BY unique` and then `JOIN` to an exact match of `ID` to `MAX(ID)`? – ebyrob Nov 10 '16 at 16:31
  • The above query deleted all of my records except one :( – Zolbayar Aug 10 '20 at 03:21
62

Deleting duplicate rows in MySQL in-place, (Assuming you have a timestamp col to sort by) walkthrough:

Create the table and insert some rows:

create table penguins(foo int, bar varchar(15), baz datetime);
insert into penguins values(1, 'skipper', now());
insert into penguins values(1, 'skipper', now());
insert into penguins values(3, 'kowalski', now());
insert into penguins values(3, 'kowalski', now());
insert into penguins values(3, 'kowalski', now());
insert into penguins values(4, 'rico', now());
select * from penguins;
    +------+----------+---------------------+
    | foo  | bar      | baz                 |
    +------+----------+---------------------+
    |    1 | skipper  | 2014-08-25 14:21:54 |
    |    1 | skipper  | 2014-08-25 14:21:59 |
    |    3 | kowalski | 2014-08-25 14:22:09 |
    |    3 | kowalski | 2014-08-25 14:22:13 |
    |    3 | kowalski | 2014-08-25 14:22:15 |
    |    4 | rico     | 2014-08-25 14:22:22 |
    +------+----------+---------------------+
6 rows in set (0.00 sec)

Remove the duplicates in place:

delete a
    from penguins a
    left join(
    select max(baz) maxtimestamp, foo, bar
    from penguins
    group by foo, bar) b
    on a.baz = maxtimestamp and
    a.foo = b.foo and
    a.bar = b.bar
    where b.maxtimestamp IS NULL;
Query OK, 3 rows affected (0.01 sec)
select * from penguins;
+------+----------+---------------------+
| foo  | bar      | baz                 |
+------+----------+---------------------+
|    1 | skipper  | 2014-08-25 14:21:59 |
|    3 | kowalski | 2014-08-25 14:22:15 |
|    4 | rico     | 2014-08-25 14:22:22 |
+------+----------+---------------------+
3 rows in set (0.00 sec)

You're done, duplicate rows are removed, last one by timestamp is kept.

For those of you without a timestamp or unique column.

You don't have a timestamp or a unique index column to sort by? You're living in a state of degeneracy. You'll have to do additional steps to delete duplicate rows.

create the penguins table and add some rows

create table penguins(foo int, bar varchar(15)); 
insert into penguins values(1, 'skipper'); 
insert into penguins values(1, 'skipper'); 
insert into penguins values(3, 'kowalski'); 
insert into penguins values(3, 'kowalski'); 
insert into penguins values(3, 'kowalski'); 
insert into penguins values(4, 'rico'); 
select * from penguins; 
    # +------+----------+ 
    # | foo  | bar      | 
    # +------+----------+ 
    # |    1 | skipper  | 
    # |    1 | skipper  | 
    # |    3 | kowalski | 
    # |    3 | kowalski | 
    # |    3 | kowalski | 
    # |    4 | rico     | 
    # +------+----------+ 

make a clone of the first table and copy into it.

drop table if exists penguins_copy; 
create table penguins_copy as ( SELECT foo, bar FROM penguins );  

#add an autoincrementing primary key: 
ALTER TABLE penguins_copy ADD moo int AUTO_INCREMENT PRIMARY KEY first; 

select * from penguins_copy; 
    # +-----+------+----------+ 
    # | moo | foo  | bar      | 
    # +-----+------+----------+ 
    # |   1 |    1 | skipper  | 
    # |   2 |    1 | skipper  | 
    # |   3 |    3 | kowalski | 
    # |   4 |    3 | kowalski | 
    # |   5 |    3 | kowalski | 
    # |   6 |    4 | rico     | 
    # +-----+------+----------+ 

The max aggregate operates upon the new moo index:

delete a from penguins_copy a left join( 
    select max(moo) myindex, foo, bar 
    from penguins_copy 
    group by foo, bar) b 
    on a.moo = b.myindex and 
    a.foo = b.foo and 
    a.bar = b.bar 
    where b.myindex IS NULL; 

#drop the extra column on the copied table 
alter table penguins_copy drop moo; 
select * from penguins_copy; 

#drop the first table and put the copy table back: 
drop table penguins; 
create table penguins select * from penguins_copy; 

observe and cleanup

drop table penguins_copy; 
select * from penguins;
+------+----------+ 
| foo  | bar      | 
+------+----------+ 
|    1 | skipper  | 
|    3 | kowalski | 
|    4 | rico     | 
+------+----------+ 
    Elapsed: 1458.359 milliseconds 

What's that big SQL delete statement doing?

Table penguins with alias 'a' is left joined on a subset of table penguins called alias 'b'. The right hand table 'b' which is a subset finds the max timestamp [ or max moo ] grouped by columns foo and bar. This is matched to left hand table 'a'. (foo,bar,baz) on left has every row in the table. The right hand subset 'b' has a (maxtimestamp,foo,bar) which is matched to left only on the one that IS the max.

Every row that is not that max has value maxtimestamp of NULL. Filter down on those NULL rows and you have a set of all rows grouped by foo and bar that isn't the latest timestamp baz. Delete those ones.

Make a backup of the table before you run this.

Prevent this problem from ever happening again on this table:

If you got this to work, and it put out your "duplicate row" fire. Great. Now define a new composite unique key on your table (on those two columns) to prevent more duplicates from being added in the first place.

Like a good immune system, the bad rows shouldn't even be allowed in to the table at the time of insert. Later on all those programs adding duplicates will broadcast their protest, and when you fix them, this issue never comes up again.

Eric Leschinski
  • 146,994
  • 96
  • 417
  • 335
  • 6
    rate up purely for the Madagascar reference! – Michael Wiggins May 26 '15 at 13:47
  • 1
    Rated up since this is a great answer, and great suggestions, thanks Eric worked better than any other answer out there. – johan Sep 26 '16 at 09:24
  • 4
    Note: If your table has an auto increment `ID` column then the `ON` clause needs only to match the `ID` column, nothing else. – ebyrob Nov 10 '16 at 16:37
  • 1
    I like the detailed explanation but...If I understand correctly, this answer makes use of the timestamp to distinguish between records. In that sense, records aren't duplicate. What if you didn't have timestamp to distinguish between records i.e. all cols are same for 2 or more records? – Rsc Rsc Nov 08 '19 at 23:30
  • 1
    @RscRsc If you don't have a timestamp column or unique index to apply the max aggregate to then it looks like you have to duplicate the table, add a unique index, apply the delete statement, then substitute the coped table back to the original. I changed the answer to reflect these instructions. – Eric Leschinski Nov 09 '19 at 03:39
61

Following remove duplicates for all SID-s, not only single one.

With temp table

CREATE TABLE table_temp AS
SELECT * FROM table GROUP BY title, SID;

DROP TABLE table;
RENAME TABLE table_temp TO table;

Since temp_table is freshly created it has no indexes. You'll need to recreate them after removing duplicates. You can check what indexes you have in the table with SHOW INDEXES IN table

Without temp table:

DELETE FROM `table` WHERE id IN (
  SELECT all_duplicates.id FROM (
    SELECT id FROM `table` WHERE (`title`, `SID`) IN (
      SELECT `title`, `SID` FROM `table` GROUP BY `title`, `SID` having count(*) > 1
    )
  ) AS all_duplicates 
  LEFT JOIN (
    SELECT id FROM `table` GROUP BY `title`, `SID` having count(*) > 1
  ) AS grouped_duplicates 
  ON all_duplicates.id = grouped_duplicates.id 
  WHERE grouped_duplicates.id IS NULL
)
Kamil Szot
  • 17,436
  • 6
  • 62
  • 65
  • 4
    GROUP-ing produces only one result row for each combination of values of fields that you group by. So duplicates will be removed. – Kamil Szot Apr 13 '10 at 14:54
  • 4
    i like the first way, too much elegant here! :B – AgelessEssence Jan 30 '12 at 13:46
  • What if table table has foreign keys or constraints? The first method is ill-advised. – fiacre Aug 20 '15 at 19:56
  • 1
    @fiacre You can disable foreign key checks temporarily: http://stackoverflow.com/questions/15501673/how-to-temporarily-disable-a-foreign-key-constraint-in-mysql You could be also risking removing some of the rows other tables refer to, but you can control which records are picked to the deduped table by altering the query `SELECT * FROM table GROUP BY title, SID;` It all depends on how well you know what you are doing. – Kamil Szot Aug 24 '15 at 10:22
  • @KamilSzot , I still believe that copping data to a temp table then dropping the original table should *only* be done in the case where there is no other meta-information in the original table -- it should have that caveat. – fiacre Aug 24 '15 at 20:18
  • instead of using an table as a temp table, why not use an actual `temporary table` as a temp table? – ahnbizcad Jun 28 '16 at 19:05
  • 1
    @ahnbizcad You could use temporary table but then you'll have to copy the data back from temporary table to regular table. If you use real table you can just drop the old one with the duplicates and rename the new one, without the duplicate to the old one's name. – Kamil Szot Jun 30 '16 at 09:42
  • Thanks! Method #1 is elegant Maybe add a note informing readers to **check** the table structure `describe 'table'` For example, I lost the primary key and had to `alter table 'table' modify 'column'...` – MCH Oct 16 '16 at 11:11
  • 1
    The method "without temp table" is closest to the best solution however beware of ONLY_FULL_GROUP_BY handling which changed in MySQL 5.7.5: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html I got this to work by replacing "SELECT id" with "SELECT ANY_VALUE(id) AS id" – delatbabel Jul 05 '17 at 02:38
  • 1
    First solution is better, much more concise. – ashishduh Dec 24 '18 at 06:09
20

After running into this issue myself, on a huge database, I wasn't completely impressed with the performance of any of the other answers. I want to keep only the latest duplicate row, and delete the rest.

In a one-query statement, without a temp table, this worked best for me,

DELETE e.*
FROM employee e
WHERE id IN
 (SELECT id
   FROM (SELECT MIN(id) as id
          FROM employee e2
          GROUP BY first_name, last_name
          HAVING COUNT(*) > 1) x);

The only caveat is that I have to run the query multiple times, but even with that, I found it worked better for me than the other options.

seaders
  • 3,878
  • 3
  • 40
  • 64
  • 1
    Pragmatic solution! Worked for me - about 20s for a 2m+ row innodb table. Once I used it a few times and was down to a few offenders with high number of duplicates, finished the job manually. – Troy Wray Feb 28 '18 at 10:05
  • 1
    Worked for me in one sweep, awesome! – Murwa Jul 23 '18 at 22:57
  • It must be executed multiple times if duplicates for any columns are more than 2x – PayteR Jul 10 '19 at 22:53
  • @PayteR that's stated in the answer, "The only caveat is that I have to run the query multiple times" – seaders Jul 12 '19 at 10:24
16

This always seems to work for me:

CREATE TABLE NoDupeTable LIKE DupeTable; 
INSERT NoDupeTable SELECT * FROM DupeTable group by CommonField1,CommonFieldN;

Which keeps the lowest ID on each of the dupes and the rest of the non-dupe records.

I've also taken to doing the following so that the dupe issue no longer occurs after the removal:

CREATE TABLE NoDupeTable LIKE DupeTable; 
Alter table NoDupeTable Add Unique `Unique` (CommonField1,CommonField2);
INSERT IGNORE NoDupeTable SELECT * FROM DupeTable;

In other words, I create a duplicate of the first table, add a unique index on the fields I don't want duplicates of, and then do an Insert IGNORE which has the advantage of not failing as a normal Insert would the first time it tried to add a duplicate record based on the two fields and rather ignores any such records.

Moving fwd it becomes impossible to create any duplicate records based on those two fields.

user3649739
  • 1,829
  • 2
  • 18
  • 28
  • 1
    Wouldn't you need an `ORDER BY` in the `SELECT` to be sure which record actually makes it over to the `NoDupeTable`? – ebyrob Nov 10 '16 at 16:40
  • @ebyrob I believe unless otherwise instructed it will select the lowest ID in absence of other criteria. Of course `ORDER by ID Asc` couldn't hurt so I'll edit my answer nontheless. – user3649739 Feb 28 '17 at 19:02
  • @ebyrob Sorry my bad. Order by won't work in this select to my knowledge. An Order by at the end of the select will only order the duplicates found by the lowest ID found in each pair. Alternately you could do a `Select Max(ID)` and then `Order by Max(ID)` but all that would do is reverse the order of the insert. To grab the highest ID would require I believe a more complex select join as, regardless of how you order above you will be grabbing the field values from the lower ID. – user3649739 Feb 28 '17 at 19:17
  • Actually, not sure what I was thinking with order by. You'd definitely want `MAX(ID)` or `MIN(ID)` and column names instead of `*` in the `SELECT FROM DupeTable` though, otherwise you'll just get one of the `ID`'s randomly. In fact, many SQLs and even MySQL strict requires calling an aggregate function on each column not specified in the `GROUP BY` clause. – ebyrob Feb 28 '17 at 23:51
  • @ebyrob On testing the Max(ID) Min(ID) do nothing except return the ID of the Max or Mind record. In each case grabs the same records. So if I had two records with fields `ID,First,Last,Notes` and records `1,Bob,Smith,NULL` and `2,Bob,Smith,Arrears` then doing a `SELECT *Max(ID), First,Last,Notes FROM DupeTable group by First,Last` would both return the same record, 1, except with a different ID. Max(ID) would return `2,Bob,Smith,NULL` and Min(ID) would return `1,Bob,Smith,NULL`. To get the second record with `Arrears' in the notes requires a join I believe. – user3649739 Mar 02 '17 at 00:28
10

The following works for all tables

CREATE TABLE `noDup` LIKE `Dup` ;
INSERT `noDup` SELECT DISTINCT * FROM `Dup` ;
DROP TABLE `Dup` ;
ALTER TABLE `noDup` RENAME `Dup` ;
Rahul Sharma
  • 2,867
  • 2
  • 27
  • 40
M.B.Miri
  • 101
  • 1
  • 3
  • I assume this only works if you have the exact identical records and not duplicates happen over time with different, for example, timestamps – Sobhan Atar Jan 07 '22 at 16:49
6

Here is a simple answer:

delete a from target_table a left JOIN (select max(id_field) as id, field_being_repeated  
    from target_table GROUP BY field_being_repeated) b 
    on a.field_being_repeated = b.field_being_repeated
      and a.id_field = b.id_field
    where b.id_field is null;
Floern
  • 33,559
  • 24
  • 104
  • 119
Ted Celestin
  • 61
  • 1
  • 1
  • Its a good answer, except a small mistake `and a.id_field = b.id` – Vikrant Goel Jun 22 '16 at 22:21
  • The `LEFT JOIN` to `b` only needs to compare `b.id` = `a.id_field` assuming `field_id` is a unique auto increment ID. so `a.field_being_repeated = b.field_being_repeated` is extraneous. (also `b.id_field` doesn't exist in this query it's `b.id`. – ebyrob Nov 10 '16 at 16:43
6

This work for me to remove old records:

delete from table where id in 
(select min(e.id)
    from (select * from table) e 
    group by column1, column2
    having count(*) > 1
); 

You can replace min(e.id) to max(e.id) to remove newest records.

richardhell
  • 969
  • 2
  • 10
  • 22
6
delete p from 
product p
inner join (
    select max(id) as id, url from product 
    group by url 
    having count(*) > 1
) unik on unik.url = p.url and unik.id != p.id;
temonehm
  • 116
  • 1
  • 7
6

I find Werner's solution above to be the most convenient because it works regardless of the presence of a primary key, doesn't mess with tables, uses future-proof plain sql, is very understandable.

As I stated in my comment, that solution hasn't been properly explained though. So this is mine, based on it.

1) add a new boolean column

alter table mytable add tokeep boolean;

2) add a constraint on the duplicated columns AND the new column

alter table mytable add constraint preventdupe unique (mycol1, mycol2, tokeep);

3) set the boolean column to true. This will succeed only on one of the duplicated rows because of the new constraint

update ignore mytable set tokeep = true;

4) delete rows that have not been marked as tokeep

delete from mytable where tokeep is null;

5) drop the added column

alter table mytable drop tokeep;

I suggest that you keep the constraint you added, so that new duplicates are prevented in the future.

xtian
  • 2,908
  • 2
  • 30
  • 43
4

This procedure will remove all duplicates (incl multiples) in a table, keeping the last duplicate. This is an extension of Retrieving last record in each group

Hope this is useful to someone.

DROP TABLE IF EXISTS UniqueIDs;
CREATE Temporary table UniqueIDs (id Int(11));

INSERT INTO UniqueIDs
    (SELECT T1.ID FROM Table T1 LEFT JOIN Table T2 ON
    (T1.Field1 = T2.Field1 AND T1.Field2 = T2.Field2 #Comparison Fields 
    AND T1.ID < T2.ID)
    WHERE T2.ID IS NULL);

DELETE FROM Table WHERE id NOT IN (SELECT ID FROM UniqueIDs);
Community
  • 1
  • 1
Simon
  • 9,197
  • 13
  • 72
  • 115
4

Another easy way... using UPDATE IGNORE:

U have to use an index on one or more columns (type index). Create a new temporary reference column (not part of the index). In this column, you mark the uniques in by updating it with ignore clause. Step by step:

Add a temporary reference column to mark the uniques:

ALTER TABLE `yourtable` ADD `unique` VARCHAR(3) NOT NULL AFTER `lastcolname`;

=> this will add a column to your table.

Update the table, try to mark everything as unique, but ignore possible errors due to to duplicate key issue (records will be skipped):

UPDATE IGNORE `yourtable` SET `unique` = 'Yes' WHERE 1;

=> you will find your duplicate records will not be marked as unique = 'Yes', in other words only one of each set of duplicate records will be marked as unique.

Delete everything that's not unique:

DELETE * FROM `yourtable` WHERE `unique` <> 'Yes';

=> This will remove all duplicate records.

Drop the column...

ALTER TABLE `yourtable` DROP `unique`;
Werner
  • 449
  • 4
  • 12
  • I think this is the best solution because it doesn't mess with tables and it uses plain simple sql. One only thing should be made clear: the `unique` column MUST be added to a unique constraint together with the columns that are currently duplicated, otherwise the whole thing doesn't work because the SET `unique` = 'Yes' would never fail. – xtian Apr 16 '18 at 13:46
  • Also be aware that `unique` is a mysql keyword. So it has to have the backticks (as already correctly displayed). Using another word for the column might be more convenient. – Torsten Jul 09 '19 at 06:55
3

If you want to keep the row with the lowest id value:

 DELETE n1 FROM 'yourTableName' n1, 'yourTableName' n2 WHERE n1.id > n2.id AND n1.email = n2.email

If you want to keep the row with the highest id value:

 DELETE n1 FROM 'yourTableName' n1, 'yourTableName' n2 WHERE n1.id < n2.id AND n1.email = n2.email
TanvirChowdhury
  • 2,498
  • 23
  • 28
2

Deleting duplicates on MySQL tables is a common issue, that usually comes with specific needs. In case anyone is interested, here (Remove duplicate rows in MySQL) I explain how to use a temporary table to delete MySQL duplicates in a reliable and fast way, also valid to handle big data sources (with examples for different use cases).

Ali, in your case, you can run something like this:

-- create a new temporary table
CREATE TABLE tmp_table1 LIKE table1;

-- add a unique constraint    
ALTER TABLE tmp_table1 ADD UNIQUE(sid, title);

-- scan over the table to insert entries
INSERT IGNORE INTO tmp_table1 SELECT * FROM table1 ORDER BY sid;

-- rename tables
RENAME TABLE table1 TO backup_table1, tmp_table1 TO table1;
0
delete from `table` where `table`.`SID` in 
    (
    select t.SID from table t join table t1 on t.title = t1.title  where t.SID > t1.SID
)
william.eyidi
  • 2,315
  • 4
  • 27
  • 39
Patrick
  • 1
  • 1
0

Love @eric's answer but it doesn't seem to work if you have a really big table (I'm getting The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET MAX_JOIN_SIZE=# if the SELECT is okay when I try to run it). So I limited the join query to only consider the duplicate rows and I ended up with:

DELETE a FROM penguins a
    LEFT JOIN (SELECT COUNT(baz) AS num, MIN(baz) AS keepBaz, foo
        FROM penguins
        GROUP BY deviceId HAVING num > 1) b
        ON a.baz != b.keepBaz
        AND a.foo = b.foo
    WHERE b.foo IS NOT NULL

The WHERE clause in this case allows MySQL to ignore any row that doesn't have a duplicate and will also ignore if this is the first instance of the duplicate so only subsequent duplicates will be ignored. Change MIN(baz) to MAX(baz) to keep the last instance instead of the first.

Kudit
  • 4,212
  • 2
  • 26
  • 32
0

This works for large tables:

 CREATE Temporary table duplicates AS select max(id) as id, url from links group by url having count(*) > 1;

 DELETE l from links l inner join duplicates ld on ld.id = l.id WHERE ld.id IS NOT NULL;

To delete oldest change max(id) to min(id)

Mugoma J. Okomba
  • 3,185
  • 1
  • 26
  • 37
0

This here will make the column column_name into a primary key, and in the meantime ignore all errors. So it will delete the rows with a duplicate value for column_name.

ALTER IGNORE TABLE `table_name` ADD PRIMARY KEY (`column_name`);
0

I think this will work by basically copying the table and emptying it then putting only the distinct values back into it but please double check it before doing it on large amounts of data.

Creates a carbon copy of your table

create table temp_table like oldtablename; insert temp_table select * from oldtablename;

Empties your original table

DELETE * from oldtablename;

Copies all distinct values from the copied table back to your original table

INSERT oldtablename SELECT * from temp_table group by firstname,lastname,dob

Deletes your temp table.

Drop Table temp_table

You need to group by aLL fields that you want to keep distinct.

0
DELETE T2
FROM   table_name T1
JOIN   same_table_name T2 ON (T1.title = T2.title AND T1.ID <> T2.ID)
slfan
  • 8,950
  • 115
  • 65
  • 78
Nav
  • 9
  • 1
0

here is how I usually eliminate duplicates

  1. add a temporary column, name it whatever you want(i'll refer as active)
  2. group by the fields that you think shouldn't be duplicate and set their active to 1, grouping by will select only one of duplicate values(will not select duplicates)for that columns
  3. delete the ones with active zero
  4. drop column active
  5. optionally(if fits to your purposes), add unique index for those columns to not have duplicates again
Yazgan
  • 151
  • 11
-2

You could just use a DISTINCT clause to select the "cleaned up" list (and here is a very easy example on how to do that).

Rosamunda
  • 14,620
  • 10
  • 40
  • 70
  • How does that answer the question? Using `DISTINCT` you loose any information about duplicates you might have had in the first place. Can you show a way to delete duplicates using it? – luk2302 Jun 12 '16 at 17:28
-3

Could it work if you count them, and then add a limit to your delete query leaving just one?

For example, if you have two or more, write your query like this:

DELETE FROM table WHERE SID = 1 LIMIT 1;
calvinf
  • 3,754
  • 3
  • 28
  • 41
-5

There are just a few basic steps when removing duplicate data from your table:

  • Back up your table!
  • Find the duplicate rows
  • Remove the duplicate rows

Here is the full tutorial: https://blog.teamsql.io/deleting-duplicate-data-3541485b3473

Can
  • 536
  • 2
  • 7
  • 22
  • Does it work if only unique id different. Eğer sadece benzersiz id farklı ise de bu işe yarar mı? – Andrew Jun 22 '17 at 18:57
  • By default the method described here does not work for MySQL versions > 5.7.5. This is because of the handling of ONLY_FULL_GROUP_BY. See here: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html – delatbabel Jul 05 '17 at 02:35