44

I have searched far and wide for an answer to this problem. I'm using a Microsoft SQL Server, suppose I have a table that looks like this:

+--------+---------+-------------+-------------+
| ID     | NUMBER  | COUNTRY     | LANG        |
+--------+---------+-------------+-------------+
| 1      | 3968    | UK          | English     |
| 2      | 3968    | Spain       | Spanish     |
| 3      | 3968    | USA         | English     |
| 4      | 1234    | Greece      | Greek       |
| 5      | 1234    | Italy       | Italian     |

I want to perform one query which only selects the unique 'NUMBER' column (whether is be the first or last row doesn't bother me). So this would give me:

+--------+---------+-------------+-------------+
| ID     | NUMBER  | COUNTRY     | LANG        |
+--------+---------+-------------+-------------+
| 1      | 3968    | UK          | English     |
| 4      | 1234    | Greece      | Greek       |

How is this achievable?

Kyle Hale
  • 7,912
  • 1
  • 37
  • 58
Jason Lipo
  • 751
  • 2
  • 11
  • 24
  • It's not really clear what you want. Do you want one row (like you asked) or two (like your second code block)? Or is it really one number *per country*? – ashes999 Dec 05 '13 at 17:01
  • You really searched "far and wide" for this? This is a pretty common query. – Kyle Hale Dec 05 '13 at 17:02
  • 1
    Also a duplicate of http://stackoverflow.com/questions/966176/select-distinct-on-one-column?rq=1 ... and http://stackoverflow.com/questions/5021693/distinct-for-only-one-column?lq=1 ... and http://stackoverflow.com/questions/1785634/select-distinct-on-one-column-return-multiple-other-columns-sql-server?lq=1 ... – Kyle Hale Dec 05 '13 at 17:05
  • 1
    And http://stackoverflow.com/questions/6127338/sql-select-distinct-but-return-all-columns?rq=1 ... and http://stackoverflow.com/questions/6628900/select-a-subgroup-of-records-by-one-distinct-column?rq=1 .. (sorry to pick on you, but not that sorry!) – Kyle Hale Dec 05 '13 at 17:07

3 Answers3

58

A very typical approach to this type of problem is to use row_number():

select t.*
from (select t.*,
             row_number() over (partition by number order by id) as seqnum
      from t
     ) t
where seqnum = 1;

This is more generalizable than using a comparison to the minimum id. For instance, you can get a random row by using order by newid(). You can select 2 rows by using where seqnum <= 2.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
41

Since you don't care, I chose the max ID for each number.

select tbl.* from tbl
inner join (
select max(id) as maxID, number from tbl group by number) maxID
on maxID.maxID = tbl.id

Query Explanation

 select 
    tbl.*  -- give me all the data from the base table (tbl) 
 from 
    tbl    
    inner join (  -- only return rows in tbl which match this subquery
        select 
            max(id) as maxID -- MAX (ie distinct) ID per GROUP BY below
        from 
            tbl 
        group by 
            NUMBER            -- how to group rows for the MAX aggregation
    ) maxID
        on maxID.maxID = tbl.id -- join condition ie only return rows in tbl 
                                -- whose ID is also a MAX ID for a given NUMBER
Kyle Hale
  • 7,912
  • 1
  • 37
  • 58
  • 2
    Anyway I didn't down-vote because of that, but rather because the self-join to an aggregate against the same table becomes exponentially more expensive (in terms of reads) as the table gets larger. [Gordon's answer](http://stackoverflow.com/a/20406419/61305), in addition to being more flexible, is also more efficient (or at least no worse). – Aaron Bertrand Dec 05 '13 at 17:25
  • 3
    Doesn't it just get more expensive polynomially? How are you getting 'exponentially'? – Anon Dec 05 '13 at 20:38
  • I answered it first then had an epiphany. Anyway, I voted for Gordon's answer, too, since most of the duplicates use the same windowing method. – Kyle Hale Dec 05 '13 at 21:05
  • Doesn't seem to work for me, perhaps I'm using it wrong. It would be nice if when people answered SQL questions they gave some kind of explanation for what is happening in their query. Otherwise it's difficult to learn from the answer. – Rampant Creative Group Jan 12 '16 at 13:36
  • @RampantCreativeGroup I added an explanation, hope it helps. – Kyle Hale Jan 12 '16 at 20:15
0

You will use the following query:

SELECT * FROM [table] GROUP BY NUMBER;

Where [table] is the name of the table.

This provides a unique listing for the NUMBER column however the other columns may be meaningless depending on the vendor implementation; which is to say they may not together correspond to a specific row or rows.

shA.t
  • 16,580
  • 5
  • 54
  • 111
Gedalya
  • 899
  • 4
  • 16
  • 28