0

Let's say we have the following SQL query that returns duplicates of emails

SELECT email, COUNT(email) 
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )

In the case above, how can we actually delete those duplicate rows? or only one of the duplicates so they are no longer duplicates?

Arya
  • 8,473
  • 27
  • 105
  • 175
  • Assuming your primary key is a `user` how would you identify which `user` you would want to delete of the two? – JNevill Apr 11 '16 at 20:23
  • Possible duplicate of [How to delete duplicate entries?](http://stackoverflow.com/questions/1746213/how-to-delete-duplicate-entries) – Vamsi Prabhala Apr 11 '16 at 20:25

2 Answers2

3

One method uses ctid:

delete from users
    where ctid not in (select min(ctid)
                       from users
                       group by email
                      );

This deletes all but one row for each email. ctid is an internal row identifier. It would be better to use a user-defined primary key column.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
2

This method below will allow you to remove the records from your table even if you don't have a primary key or unique identifier.

WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY email ORDER BY email) AS RN
FROM users
)

DELETE FROM CTE WHERE RN > 1
Mike Deluca
  • 1,295
  • 2
  • 18
  • 41