Does a calculation in a HAVING and in the SELECT mean it will be performed twice?

Question

Say I have this query:

SELECT CompanyId, COUNT(*) as Total
FROM Customer
GROUP BY CompanyId
HAVING COUNT(*) > 100

I have the COUNT(*) twice in my query. Does this mean the COUNT is performed twice?

This is a simple example, but when I have a more complex calculation (something like SUM(Weight) / COUNT(*)) I'm worried it might impact performance. Or will any performance impact be negligable?

I'm using MS SQL 2012, and can't do HAVING Total > 100.

Similar question https://dba.stackexchange.com/questions/119575/how-to-avoid-invoking-functions-twice-when-using-group-by-and-having/119578 — OutOfMind, Aug 25 '17 at 09:56

score 6 · Accepted Answer · answered Aug 25 '17 at 11:36

If you have a healthy interest in how your queries are processed under the covers, get familiar with execution plans and how to read them. All of what follows was established experimentally using those.

Aggregates are not computed more than once, but the expressions involving them are. Consider:

SELECT CompanyId, SUM(Weight) / COUNT(*)
FROM Customer
GROUP BY CompanyId
HAVING SUM(Weight) / COUNT(*) > 100

SUM(Weight) and COUNT(*) will be computed only once, but the division will be performed twice (once when filtering, and once where selecting). This has no measurable impact on performance, of course -- the key is that it minimizes the number of times it has to go through all the data.

This means that even if your HAVING is completely different from your SELECT list, the table will still only be scanned once and aggregated once:

SELECT CompanyId, MAX(Weight), MIN(Weight), COUNT(*) as Total
FROM Customer
GROUP BY CompanyId
HAVING MAX(Weight) > 2 * MIN(Weight) AND AVG(Weight) > 0.5

There are four aggregates here: MAX(Weight), MIN(Weight), AVG(Weight) and COUNT(*).¹ The optimizer will compute all of those in one pass, group the whole by CompanyId, apply the HAVING filter and then select the desired result.²

Disclaimer: as with all claims about what the optimizer does, all this is subject to change in any release of SQL Server and may vary with trace flags, statistics, indexes, and the specifics of particular queries. The above is true for SQL Server 2012 and 2016, at least, for two particular databases, at least, where indexes play no role.

AVG isn't actually an aggregate on its own; internally the optimizer expands it to SUM / COUNT(*), with a check to prevent division by zero. So the aggregates are actually MAX, MIN, SUM and COUNT.
This is the case for a sequential plan. For a parallel plan, things get a little more complicated with multiple parallel scans that have to be joined together, but it's still true that aggregates aren't computed more than once (where possible).

Gordon Linoff · Answer 2 · 2017-08-25T10:15:00.783

1

In all likelihood, the COUNT(*) will be done twice, although the decision is really up to the query optimizer.

And, this pretty much makes no difference at all.

The effort expended in aggregating data is in moving data around, not in the aggregation functions, especially really simply ones like COUNT(*). Some aggregation functions are more expensive (COUNT(DISTINCT) comes to mind), but in general, data movement is more expensive than a simple aggregation function.

You don't mention the database, but many allow:

HAVING Total > 100

Or the use of a subquery/CTE usually has no effect on performance.

edited Aug 25 '17 at 10:15

answered Aug 25 '17 at 09:56

Gordon Linoff

1,242,037
58
646
786

Ah, yes I didn't mention the database. Updated it now. – Peter Aug 25 '17 at 10:37

Does a calculation in a HAVING and in the SELECT mean it will be performed twice?

2 Answers2