How to delete duplicate rows with SQL?

Question

I have a table with some rows in. Every row has a date-field. Right now, it may be duplicates of a date. I need to delete all the duplicates and only store the row with the highest id. How is this possible using a SQL query?

Now:

date      id
'07/07'   1
'07/07'   2
'07/07'   3
'07/05'   4
'07/05'   5

What I want:

date      id
'07/07'   3
'07/05'   5

From the data you sent, you end up with two not three rows! 07/05 is repeated. — notnoop, Jul 23 '09 at 19:45

score 32 · Answer 1 · answered Jul 23 '09 at 19:52

32

DELETE FROM table WHERE id NOT IN
    (SELECT MAX(id) FROM table GROUP BY date);

answered Jul 23 '09 at 19:52

Georg Schölly

124,188
49
220
267

Wow, did I go a roundabout way or what? This is definitely the best way to do this. – Eric Jul 23 '09 at 19:53
I thought your way was a bit too complicated... But honestly, I wanted to do it first using 3 queries instead of just this one. – Georg Schölly Jul 23 '09 at 19:55
4

This query is also useful for this answer: SELECT date, COUNT(date) AS NumOccurrences FROM table GROUP BY date HAVING ( COUNT(date) > 1 ) – djangofan Jul 23 '09 at 22:16
@djangofan: almost, you just hvae to select id instead of COUNT(date). – Georg Schölly Jul 24 '09 at 04:41
That however wouldn't work in MySQL due to its stupid limitations on sub-selects. – Jan 31 '13 at 15:06
Depending on your data, this query can take a LOT longer than the one suggested by iddqd. – daSong May 29 '14 at 17:12

score 6 · Answer 2 · answered Aug 02 '11 at 16:06

I don't have comment rights, so here's my comment as an answer in case anyone comes across the same problem:

In SQLite3, there is an implicit numerical primary key called "rowid", so the same query would look like this:

DELETE FROM table WHERE rowid NOT IN
(SELECT MAX(rowid) FROM table GROUP BY date);

this will work with any table even if it does not contain a primary key column called "id".

score 3 · Answer 3 · answered Sep 23 '10 at 11:31

For mysql,postgresql,oracle better way is SELF JOIN.

Postgresql:
DELETE FROM table t1 USING table t2 WHERE t1.date=t2.date AND t1.id<t2.id;

MySQL        
DELETE FROM table
USING table, table as vtable
WHERE (table.id < vtable.id)
AND (table.date=vtable.date)

SQL aggregate (max,group by) functions almost always are very slow.

How to delete duplicate rows with SQL?

3 Answers3

Linked

Related