SQL - select distinct only on one column

Question

I have searched far and wide for an answer to this problem. I'm using a Microsoft SQL Server, suppose I have a table that looks like this:

+--------+---------+-------------+-------------+
| ID     | NUMBER  | COUNTRY     | LANG        |
+--------+---------+-------------+-------------+
| 1      | 3968    | UK          | English     |
| 2      | 3968    | Spain       | Spanish     |
| 3      | 3968    | USA         | English     |
| 4      | 1234    | Greece      | Greek       |
| 5      | 1234    | Italy       | Italian     |

I want to perform one query which only selects the unique 'NUMBER' column (whether is be the first or last row doesn't bother me). So this would give me:

+--------+---------+-------------+-------------+
| ID     | NUMBER  | COUNTRY     | LANG        |
+--------+---------+-------------+-------------+
| 1      | 3968    | UK          | English     |
| 4      | 1234    | Greece      | Greek       |

How is this achievable?

It's not really clear what you want. Do you want one row (like you asked) or two (like your second code block)? Or is it really one number *per country*? — ashes999, Dec 05 '13 at 17:01
You really searched "far and wide" for this? This is a pretty common query. — Kyle Hale, Dec 05 '13 at 17:02
Also a duplicate of http://stackoverflow.com/questions/966176/select-distinct-on-one-column?rq=1 ... and http://stackoverflow.com/questions/5021693/distinct-for-only-one-column?lq=1 ... and http://stackoverflow.com/questions/1785634/select-distinct-on-one-column-return-multiple-other-columns-sql-server?lq=1 ... — Kyle Hale, Dec 05 '13 at 17:05
And http://stackoverflow.com/questions/6127338/sql-select-distinct-but-return-all-columns?rq=1 ... and http://stackoverflow.com/questions/6628900/select-a-subgroup-of-records-by-one-distinct-column?rq=1 .. (sorry to pick on you, but not that sorry!) — Kyle Hale, Dec 05 '13 at 17:07

score 58 · Answer 1 · answered Dec 05 '13 at 17:15

A very typical approach to this type of problem is to use row_number():

select t.*
from (select t.*,
             row_number() over (partition by number order by id) as seqnum
      from t
     ) t
where seqnum = 1;

This is more generalizable than using a comparison to the minimum id. For instance, you can get a random row by using order by newid(). You can select 2 rows by using where seqnum <= 2.

Kyle Hale · Accepted Answer · 2016-01-16T02:49:24.837

41

Since you don't care, I chose the max ID for each number.

select tbl.* from tbl
inner join (
select max(id) as maxID, number from tbl group by number) maxID
on maxID.maxID = tbl.id

Query Explanation

 select 
    tbl.*  -- give me all the data from the base table (tbl) 
 from 
    tbl    
    inner join (  -- only return rows in tbl which match this subquery
        select 
            max(id) as maxID -- MAX (ie distinct) ID per GROUP BY below
        from 
            tbl 
        group by 
            NUMBER            -- how to group rows for the MAX aggregation
    ) maxID
        on maxID.maxID = tbl.id -- join condition ie only return rows in tbl 
                                -- whose ID is also a MAX ID for a given NUMBER

edited Jan 16 '16 at 02:49

answered Dec 05 '13 at 17:01

Kyle Hale

7,912
1
37
58

2

Anyway I didn't down-vote because of that, but rather because the self-join to an aggregate against the same table becomes exponentially more expensive (in terms of reads) as the table gets larger. [Gordon's answer](http://stackoverflow.com/a/20406419/61305), in addition to being more flexible, is also more efficient (or at least no worse). – Aaron Bertrand Dec 05 '13 at 17:25
3

Doesn't it just get more expensive polynomially? How are you getting 'exponentially'? – Anon Dec 05 '13 at 20:38
I answered it first then had an epiphany. Anyway, I voted for Gordon's answer, too, since most of the duplicates use the same windowing method. – Kyle Hale Dec 05 '13 at 21:05
Doesn't seem to work for me, perhaps I'm using it wrong. It would be nice if when people answered SQL questions they gave some kind of explanation for what is happening in their query. Otherwise it's difficult to learn from the answer. – Rampant Creative Group Jan 12 '16 at 13:36
@RampantCreativeGroup I added an explanation, hope it helps. – Kyle Hale Jan 12 '16 at 20:15

score 0 · Answer 3 · edited May 18 '15 at 12:37

0

You will use the following query:

SELECT * FROM [table] GROUP BY NUMBER;

Where [table] is the name of the table.

This provides a unique listing for the NUMBER column however the other columns may be meaningless depending on the vendor implementation; which is to say they may not together correspond to a specific row or rows.

edited May 18 '15 at 12:37

shA.t

16,580
5
54
111

answered Dec 05 '13 at 17:06

Gedalya

899
4
16
28

8

@Gedalya This is possible only in MySQL. It will return an error in SQL server. – sahalMoidu Dec 05 '13 at 17:12

SQL - select distinct only on one column

3 Answers3

Linked