Why doesn't this query result change when DISTINCT keyword is used?

Question

Why is the result of the two query the same? (I'm using northwind database.)

SELECT      ContactTitle
        ,   COUNT(CustomerID) AS NumberOfCustomer
FROM        dbo.Customers
WHERE       ContactTitle LIKE '%sales%'
GROUP BY    ContactTitle
HAVING      COUNT(*) >= 5
ORDER BY    NumberOfCustomer desc

and

SELECT 
DISTINCT    ContactTitle
        ,   COUNT(CustomerID) AS NumberOfCustomer
FROM        dbo.Customers
WHERE       ContactTitle LIKE '%sales%'
GROUP BY    ContactTitle
HAVING      COUNT(*) >= 5
ORDER BY    NumberOfCustomer desc

the result is:

ContactTitle           NumberOfCustomer
---------------------  ----------------
Sales Representative         17
Sales Manager                11
Sales Associate               7
Sales Agent                   5

In my own understanding, the second query gets the distinct titles and count its records so i'm expecting that the result would be none because each title has only record count of 1. Am i right?

score 7 · Answer 1 · answered May 02 '12 at 14:33

7

DISTINCT is done after the other operations. First it does the GROUP BY which already makes each row distinct, so the DISTINCT is redundant.

answered May 02 '12 at 14:33

Chad

7,279
2
24
34

score 3 · Answer 2 · answered May 02 '12 at 14:33

3

The distinct is applied to the title and the count. Once your select is done computing, it then creates the distinct list from that.

answered May 02 '12 at 14:33

Donald Miner

38,889
8
95
118

score 3 · Answer 3 · answered May 02 '12 at 14:34

3

DISTINCT will filter duplicate records from the result set. Since in this case there are no duplicate records, the DISTINCT has no effect.

answered May 02 '12 at 14:34

Justin Ethier

131,333
52
229
284

score 3 · Accepted Answer · 2012-05-02T14:51:45.427

3

This is how the query execution works. In your second statement, DISTINCT doesn't do any additional function because your GROUP BY containing the same column name ContactTitle has already carried out that operation for you.

1. FROM
2. WHERE
3. GROUP BY <-- You have specified the column `ContactTitle`, 
-- which means the results would be grouped by that column to product unique 
--result.
4. HAVING
5. SELECT <-- Adding DISTINCT on ContactTitle column here doesn't make much 
-- difference and it is actually redundant. DISTINCT is applied to the whole
-- row but the resultset already contains distinct rows grouped by the column 
-- `ContactTitle`.
6. ORDER BY

edited May 02 '12 at 14:51

answered May 02 '12 at 14:38

1

Correct, except for the *"Adding DISTINCT on ContactTitle column"*. The `DISTINCT` is applied to the whole row, not a column. – ypercubeᵀᴹ May 02 '12 at 14:47
thank you for the illustration. how do you understand the query execution plan? – SkyDrive May 02 '12 at 14:54
Short story here (shameless ad): [SELECT: order or processing](http://stackoverflow.com/questions/6545664/using-case-expression-column-in-where-clause/6545685#6545685) – ypercubeᵀᴹ May 02 '12 at 15:47

Why doesn't this query result change when DISTINCT keyword is used?

4 Answers4