3

Why is the result of the two query the same? (I'm using northwind database.)

SELECT      ContactTitle
        ,   COUNT(CustomerID) AS NumberOfCustomer
FROM        dbo.Customers
WHERE       ContactTitle LIKE '%sales%'
GROUP BY    ContactTitle
HAVING      COUNT(*) >= 5
ORDER BY    NumberOfCustomer desc

and

SELECT 
DISTINCT    ContactTitle
        ,   COUNT(CustomerID) AS NumberOfCustomer
FROM        dbo.Customers
WHERE       ContactTitle LIKE '%sales%'
GROUP BY    ContactTitle
HAVING      COUNT(*) >= 5
ORDER BY    NumberOfCustomer desc

the result is:

ContactTitle           NumberOfCustomer
---------------------  ----------------
Sales Representative         17
Sales Manager                11
Sales Associate               7
Sales Agent                   5

In my own understanding, the second query gets the distinct titles and count its records so i'm expecting that the result would be none because each title has only record count of 1. Am i right?

SkyDrive
  • 1,445
  • 4
  • 15
  • 24

4 Answers4

7

DISTINCT is done after the other operations. First it does the GROUP BY which already makes each row distinct, so the DISTINCT is redundant.

Chad
  • 7,279
  • 2
  • 24
  • 34
3

The distinct is applied to the title and the count. Once your select is done computing, it then creates the distinct list from that.

Donald Miner
  • 38,889
  • 8
  • 95
  • 118
3

DISTINCT will filter duplicate records from the result set. Since in this case there are no duplicate records, the DISTINCT has no effect.

Justin Ethier
  • 131,333
  • 52
  • 229
  • 284
3

This is how the query execution works. In your second statement, DISTINCT doesn't do any additional function because your GROUP BY containing the same column name ContactTitle has already carried out that operation for you.

1. FROM
2. WHERE
3. GROUP BY <-- You have specified the column `ContactTitle`, 
-- which means the results would be grouped by that column to product unique 
--result.
4. HAVING
5. SELECT <-- Adding DISTINCT on ContactTitle column here doesn't make much 
-- difference and it is actually redundant. DISTINCT is applied to the whole
-- row but the resultset already contains distinct rows grouped by the column 
-- `ContactTitle`.
6. ORDER BY
  • 1
    Correct, except for the *"Adding DISTINCT on ContactTitle column"*. The `DISTINCT` is applied to the whole row, not a column. – ypercubeᵀᴹ May 02 '12 at 14:47
  • thank you for the illustration. how do you understand the query execution plan? – SkyDrive May 02 '12 at 14:54
  • Short story here (shameless ad): [SELECT: order or processing](http://stackoverflow.com/questions/6545664/using-case-expression-column-in-where-clause/6545685#6545685) – ypercubeᵀᴹ May 02 '12 at 15:47