SQL remove rows by duplicates in the specified column

Question

I have a data like this below,

id,username,whatevermorecolumns
1,cat,more data here..
2,kitty,..
3,cat,..
4,kitten,..

and want to remove the rows that has duplicated username. so I expect a result like this

id,username,whatevermorecolumns
1,cat,more data here..
2,kitty,..
4,kitten,..

There is a problem which, id 1 or 3, should be removed, maybe, I would love to know that also, but what I'm trying to do is count() the rows after filtering the select result, so not a big problem here.

I googled and read some stackoverflow posts, tried "group by" and "distinct" stuff but still have no good idea about this. maybe because it's postgresql? Thanks for your help.

score 2 · Accepted Answer · answered May 16 '21 at 15:06

Have you tried DISTINCT ON?

A similar case to yours on Stack Overflow: sql - Remove duplicate rows based on field in a select query with PostgreSQL? - Stack Overflow

Example:

SELECT DISTINCT ON (username) id, username, whatevermorecolumns 
from table
where ..

score 1 · Answer 2 · answered May 16 '21 at 15:14

Distinct on should do the job

id	username	whatevermorecolumns
1	cat	more data here..
2	kitty,..
3	cat,..
4	kitten,..

SELECT DISTINCT ON (username)
  id,
  username,
  whatevermorecolumns,
FROM tablename;

DISTINCT ON will ensure you get one row for an unique key combination which is specified in the paranthesis ( line 1 of code).

score 1 · Answer 3 · answered May 16 '21 at 18:38

If you want to remove the rows, then you can modify the table using delete:

delete from t
    where t.id > (select min(t2.id) from t t2 where t2.username = t.username);

This removes all but the row with the smallest id.

If you just want a result set with no duplicates, then the other answers recommending distinct on are the right answer.

SQL remove rows by duplicates in the specified column

3 Answers3