Remove duplicates in Select query based on one column

Question

I want to select without duplicate ids and keep row '5d' and not '5e' in select statement.

table

id | name
1  | a
2  | b
3  | c
5  | d
5  | e

I tried:

SELECT id, name 
FROM table t
INNER JOIN (SELECT DISTINCT id FROM table) t2 ON t.id = t2.id

Duplicate: https://stackoverflow.com/questions/4662464/how-to-select-only-the-first-rows-for-each-unique-value-of-a-column — Edney Holder, Mar 28 '19 at 21:13
Possible duplicate of [How can I remove duplicate rows?](https://stackoverflow.com/questions/18932/how-can-i-remove-duplicate-rows) — Dale K, Mar 28 '19 at 21:16
Make sure to always specify what rule you want applied. " and keep row 5d and not 5e" only tells us that 'd' wins over 'e' and we don't know why. What if it were not 'd' and 'e' but 'a' and 'z' or whatever other name? Is 'd' better than 'e', because it comes before 'e' in the alphabet? Or because it is not a vowel? So, in case of duplicates: how to decide for the value to keep? — Thorsten Kettner, Mar 28 '19 at 21:23
Define "earliest"? By definition a table is an unordered set. That means you have to provide order with the data. — Sean Lange, Mar 28 '19 at 21:33

score 6 · Answer 1 · answered Mar 28 '19 at 21:11

6

For the given example an aggregation using min() would work.

SELECT id,
       min(name) name
       FROM table
       GROUP BY id;

answered Mar 28 '19 at 21:11

sticky bit

36,626
12
31
42

score 5 · Accepted Answer · answered Mar 28 '19 at 21:31

You can also use ROW_NUMBER():

SELECT id, name
FROM (
    SELECT id, name, ROW_NUMBER() OVER(PARTITION BY id ORDER BY name) rn 
    FROM mytable
) x
WHERE rn = 1

This will retain the record that has the smallest name (so '5d' will come before '5e'). With this technique, you can also use a sort criteria on another column that the one where duplicates exists (which an aggregate query with MIN() cannot do). Also, queries using window functions usually perform better than the equivalent aggregate query.

score 3 · Answer 3 · answered Mar 28 '19 at 21:12

3

If you want to keep the row with the smallest name then you can use not exists:

select t.* from tablename t
where not exists (
  select 1 from tablename
  where id = t.id and name < t.name
)

answered Mar 28 '19 at 21:12

forpas

160,666
10
38
76

I am a little confused by this snippet, could you please a bit explain how it works? – bentz123 Aug 10 '21 at 11:15
@bentz123 this query selects only the rows of the tables for which there is no other row with the same `id` but less `name` (alphabetically). So it returns distinct ids with the least `name` of each id. – forpas Aug 10 '21 at 11:19

Remove duplicates in Select query based on one column

3 Answers3