0

I have been refactoring some old queries recently and noticed that a lot of them repeat ISNULL() in the GROUP BY clause, where it is used in the SELECT clause. I feel in my bones that removing the ISNULL() in the GROUP BY clause will improve performance, but I can't find any documentation on whether it is actually likely to or not. Here's the sort of thing I mean:

SELECT
     ISNULL(Foo,-1) AS Foo
    ,ISNULL(Bar,-1) AS Bar
    ,SUM(This)      AS This
    ,SUM(That)      AS That
FROM
            dbo.ThisThatTable   AS ThisThat
LEFT JOIN   dbo.FooBarTable     AS FooBar   ON ThisThat.FooBarId = FooBar.Id
GROUP BY
     ISNULL(Foo,-1)
    ,ISNULL(Bar,-1);
GO

The above is the way I keep coming across - When there is grouping on the Foo column, the SELECT and the GROUP BY for selected columns match exactly. The example below is a possible alternative - some possibly unnecessary ISNULL() calls have been removed, and the SELECT and GROUP BY clauses no longer match.

SELECT
     ISNULL(Foo,-1) AS Foo
    ,ISNULL(Bar,-1) AS Bar
    ,SUM(This)      AS This
    ,SUM(That)      AS That
FROM
            dbo.ThisThatTable   AS ThisThat
LEFT JOIN   dbo.FooBarTable     AS FooBar   ON ThisThat.FooBarId = FooBar.Id
GROUP BY
     Foo
    ,Bar;
GO

I suppose maybe when the SELECT and GROUP BY clauses match, the optimiser only has to do the ISNULL() calculation once to know what is going on, so it might be theoretically more performative to group by the results that are actually selected? Alternatively, maybe it is better to avoid adding a second set of ISNULL() calls that don't change the granularity of the data at all... Maybe the optimiser is clever enough to realise that the NULLS in the grouping are (in this case) -1s in the selection...?

I personally would prefer removing any unnecessary functions, especially once that might affect index usage but when I look online, the references to performance are all like the answers here, about using ISNULL() in the WHERE clause, which I already know to avoid.

I also suspect that any gains are going to be vanishingly small, so this is really asking for an academic or theoretical answer, but as I work, I keep wondering and it bugs me, so I thought I would ask if anyone has any thoughts.

Dale K
  • 25,246
  • 15
  • 42
  • 71
High Plains Grifter
  • 1,357
  • 1
  • 12
  • 36
  • 3
    When you looked at the execution plan did you notice any difference? – Dale K May 20 '22 at 10:36
  • 'old queries ' - suspect old versions of sql-server - you maybe want to browse for when aliases became allowed in group by. – P.Salmon May 20 '22 at 11:48
  • The version with `ISNULL()` in the grouping had more Compute Scalar and Parallelism (Repartition Streams) actions in the execution plan - the plan without seemed much simpler, less actions etc but the query performance was about the same... – High Plains Grifter May 20 '22 at 12:00
  • @P.Salmon I don't have access to any Postgresql or the like... I think therefore I cannot use aliases in the GROUP BY clause (I may be wrong?) – High Plains Grifter May 20 '22 at 12:07
  • You are correct, @HighPlainsGrifter, an alias of a column *cannot* be used in the `GROUP BY` this is true to the latest version of SQL Server (2019) and in Azure SQL Database (which has some bleeding edge features). This can also be confirmed in [Logical Processing Order of the SELECT statement](https://learn.microsoft.com/en-us/sql/t-sql/queries/select-transact-sql?view=sql-server-ver15#logical-processing-order-of-the-select-statement), where you can see that the `GROUP BY` is processed *prior* to the `SELECT`; meaning that the aliases haven't been parsed yet. – Thom A May 20 '22 at 12:44
  • Assuming that there are rows with a value of `-1`, @Charlieface . If `Foo` and `Bar` have a `CHECK CONSTRAINT` to ensure they have a positive value then the `-1` value would be a valid alternative group to `NULL`. – Thom A May 20 '22 at 12:54

3 Answers3

2

Non-aggregated columns in SELECT clauses generally must precisely match the ones in GROUP BY clauses. If I were you, and I were dealing with tested production code, I would not make the change you propose.

Edit the match between non-aggregated SELECT columns and GROUP BY columns is necessary for GROUP BY. If the columns in SELECT are 1:1 dependent on the columns in GROUP BY, it will work. Otherwise the results are ambiguous.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • Is that because the select happens after the grouping is made or something, so no re-computing needs to be done to group, as opposed to non-matching? – High Plains Grifter May 20 '22 at 12:03
  • It's a good point that I am making the (currently true) assumption that there is a 1:1 mapping between `ISNULL([Column],-1)` and `[Column]` - that may not always be true and maybe that is the best reason to keep the Grouping and Selection matched, even at the cost of an infinitesimal performance loss. – High Plains Grifter May 20 '22 at 14:25
1

Internally, SQL does not really have two copies of each ISNULL. They are all flattened together in the internal tree used during compilation. So, this level of optimization is not useful to consider in SQL Server. A query without any ISNULL in it would probably perform a bit faster and potentially a lot faster depending on the rest of the schema and query. However, the ISNULL in the select list and the GROUP BY list are not executed twice in most cases within SQL - this level of detail can show up in showplan, but it's often below the level of detail most people would care to examine.

Conor Cunningham MSFT
  • 4,151
  • 1
  • 15
  • 21
  • yeah, I guess I figured as much, it's just... after seeing it 10 times in the day and wondering every time whether to change it, I thought I'd ask the question! I am not going to make the change, I think, for exactly the reason you give. – High Plains Grifter May 20 '22 at 14:22
1

There are a few different aspects to consider here:

  • Referring to the same value multiple times in the same scope
    In most situations, the optimizer is clever enough to collapse these into calculating them once. The fact that you have a GROUP BY over them makes this even more likely.

  • Is it faster to group when the value is guaranteed to not be null?
    Possibly, although I doubt the difference is measurable.

  • The SELECT does not have to match exactly, it only needs to be functionally dependent on GROUP BY columns and aggregation functions. It may not be functionally dependent on any other columns.

  • The most important thing top consider: indexing.
    This is much, much more important than the other considerations. When grouping, if you can hit an index then it will go much faster, because it can remove sorting and just use Stream Aggregate. This is not possible if you use ISNULL in the GROUP BY (barring computed columns or indexed views).

Note that your results will not be the same: the first example collapses the NULL group into the -1 group. The second example does not, so you may want to remove the ISNULL from the SELECT also, in order to differentiate them. Alternatively, put a WHERE ... IS NOT NULL instead.

Charlieface
  • 52,284
  • 6
  • 19
  • 43