When the performance of Distinct and Group By are different?

Question

I know in simple queries the performance and execution plans of the Distinct and Group By are almost the same.

e.g.

SELECT Name FROM NamesTable GROUP BY Name
SELECT DISTINCT Name FROM NamesTable

But I've read in some scenarios their performance would be different e.g. in subqueries, etc?

So, could you make some examples or explain some scenarios where their performance are different?

Many thanks

When they do the same thing, their performance is the same. When they do different things, their performance can be different. If there are specific examples you're interested in, you should include them in your question. — AakashM, May 18 '12 at 10:30
Why the downvotes? It's an answerable, programming-related question (though the answer is "no, we can't"). — Quassnoi, May 18 '12 at 10:46

score 16 · Accepted Answer · answered May 18 '12 at 11:04

16

If you include a calculated value in the field list you will see a difference in the execution plan.

select Value,
       getdate()
from YourTable
group by UnitID

select distinct
       Value,
       getdate()
from YourTable

The group by query aggregates before it computes the scalar value. The distinct query computes the scalar value before the aggregate.

answered May 18 '12 at 11:04

Mikael Eriksson

So, this could potentially produce a different result and/or performance. – The Light May 18 '12 at 11:37
2

@William - Not different result. The most obvious performance issue I have seen is when you are doing string concatenation using the `for xml` trick. I have seen both a version that uses `distinct` and a version that uses `group by` and the `group by` version is a lot faster. But even this simple query when executed over 500000 rows generating 40 groups shows a performance difference. For me, `group by` takes 80 ms and `distinct` takes 105 ms. – Mikael Eriksson May 18 '12 at 11:48
no, it can certainly produce a different result. For example, imagine you have a calculated column such as ROW_NUMBER() or a user defined function which returns a value. In such scenarios, for Distinct first the calculated column gets executed for each row in the table then the Scalar columns are added whereas when using Group By, first the scalar columns are grouped by then for each group the calculated column gets calculated. So they can produce different results because Distinct and Group By execute the Calculated Columns in a different order. – The Light May 18 '12 at 12:04
@William Yes, you are right. Replace `getdate()` with `newid()` in the above queries and you will have a different result. – Mikael Eriksson May 18 '12 at 12:15

score 7 · Answer 2 · answered May 18 '12 at 12:11

7

Here are 2 examples, one for producing a different result and the other for a different performance:

Example for producing different performance

And the second example:

Example for producing different result

answered May 18 '12 at 12:11

The Light

2 Answers2