1

I have a data like this below,

id,username,whatevermorecolumns
1,cat,more data here..
2,kitty,..
3,cat,..
4,kitten,..

and want to remove the rows that has duplicated username. so I expect a result like this

id,username,whatevermorecolumns
1,cat,more data here..
2,kitty,..
4,kitten,..

There is a problem which, id 1 or 3, should be removed, maybe, I would love to know that also, but what I'm trying to do is count() the rows after filtering the select result, so not a big problem here.

I googled and read some stackoverflow posts, tried "group by" and "distinct" stuff but still have no good idea about this. maybe because it's postgresql? Thanks for your help.

ー PupSoZeyDe ー
  • 1,082
  • 3
  • 14
  • 33

3 Answers3

2

Have you tried DISTINCT ON?

A similar case to yours on Stack Overflow: sql - Remove duplicate rows based on field in a select query with PostgreSQL? - Stack Overflow

Example:

SELECT DISTINCT ON (username) id, username, whatevermorecolumns 
from table
where ..
gupsevopse
  • 109
  • 1
  • 1
  • 3
1

Distinct on should do the job

id username whatevermorecolumns
1 cat more data here..
2 kitty,..
3 cat,..
4 kitten,..
SELECT DISTINCT ON (username)
  id,
  username,
  whatevermorecolumns,
FROM tablename;

DISTINCT ON will ensure you get one row for an unique key combination which is specified in the paranthesis ( line 1 of code).

AbhiKP
  • 11
  • 2
1

If you want to remove the rows, then you can modify the table using delete:

delete from t
    where t.id > (select min(t2.id) from t t2 where t2.username = t.username);

This removes all but the row with the smallest id.

If you just want a result set with no duplicates, then the other answers recommending distinct on are the right answer.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786