7

I know that the query below causes the error - ORDER BY items must appear in the select list if SELECT DISTINCT is specified.

SELECT DISTINCT city
FROM HR.Employees
WHERE country = N'USA' AND region = N'WA'
ORDER BY birthdate;

What is the reason for this ? What is actually happening ? If I don't use DISTINCT or just add birthdate in SELECT or ORDER BY city only, it gives me some output, but not an error. Is it because SELECT DISTINCT city only gives a result set with cities and nothing else ?

EDIT -(I think this may be an answer to my question)

SELECT city
FROM HR.Employees
WHERE country = N'USA' AND region = N'WA'

Consider the above query. Not sure, but I think this is how it works - Behind the scenes, SQL Server probably has a result set with all columns, but we are only shown the city column which I will call the "displayed set" . So, ordering by city or any other column is valid.

What happens to the result set when we use SELECT DISTINCT city instead ? SQL server does not have only the rows with DISTINCT cities in its result set. It has the entire result set, like the one generated by a SELECT * query. But, it shows only distinct rows based on city. Now can this displayed set be sorted based on birthdate ? No.

In a city with many employees, ie many birthdates, SQL server cannot tell which birthdate should be used for ordering the displayed set. That is why it displays an error message.

Steam
  • 9,368
  • 27
  • 83
  • 122
  • I saw Joel Coehoorn's answer, but I am not sure if that is the best way to think about it - http://stackoverflow.com/questions/265628/order-by-items-must-appear-in-the-select-list-if-select-distinct-is-specified?rq=1 – Steam Sep 03 '13 at 05:19
  • 2
    There can be multiple employees in a city so what birth date should it use when ordering the city? – Martin Smith Sep 03 '13 at 05:22
  • @MartinSmith - I agree. My only doubt was if DISTINCT generates a result set which only has rows with distinct cities. I think that was the error in my thinking. So, I made an edit to my question. Please see it and tell me if it makes sense. – Steam Sep 03 '13 at 06:07

2 Answers2

26

A query with SELECT DISTINCT can be rewritten using GROUP BY. So the query:

SELECT DISTINCT city
FROM HR.Employees
WHERE country = N'USA' AND region = N'WA' ;

is equivalent to:

SELECT city
FROM HR.Employees
WHERE country = N'USA' AND region = N'WA'
GROUP BY city ;

and you can't use ORDER BY birthdate here either. The reason is the same for both queries. There may be many (more than one) rows with same city but different birthdate. Which one should be used for the ordering (if it was allowed?)

You can however use aggregate functions with a GROUP BY query:

SELECT city
FROM HR.Employees
WHERE country = N'USA' AND region = N'WA'
GROUP BY city 
ORDER BY MIN(birthdate) ;               -- or MAX(birthdate)
ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
  • Thanks. Can you also see my update and tell me if it makes sense. I'd like to know if my guesswork makes sense. – Steam Sep 03 '13 at 06:04
  • 1
    Yes, I agree with those. Except for the last line ("In a city with many employees,...") Even if all cities have only one employee, the query would not be allowed. Because potentially, new employees can be inserted, i.e. SQL-Server does not examine the actual data but the query structure and enforced/declared constraints (and in this case, not even those). – ypercubeᵀᴹ Sep 03 '13 at 06:30
1
try this: 
select city 
    from (
    SELECT city, min(birthdate) as birthdate
    FROM HR.Employees
    WHERE country = N'USA' AND region = N'WA'
    group by city
          ) as t
    order by birthdate
xurca
  • 2,426
  • 3
  • 24
  • 29
  • 2
    I was not looking for a workaround or alternative code, but for an explanation of how things work. – Steam Sep 03 '13 at 06:16