0

In SQL Server 2019, analytic functions are not returning the results that I would expect in the context of recursive common table expressions. Consider the following non-recursive T-SQL query:

WITH SourceData (RowNum, Uniform, RowVal) AS (
  SELECT 1, 'A', 'A' UNION ALL
  SELECT 2, 'A', 'B' UNION ALL
  SELECT 3, 'A', 'C' UNION ALL
  SELECT 4, 'A', 'D'
),
RecursiveCte0 (RowNum, Uniform, RowVal, MinVal, SomeSum, RowNumCalc, RecursiveLevel) AS (
  SELECT RowNum, Uniform, RowVal, RowVal, RowNum, CAST(RowNum AS BIGINT), 0
  FROM SourceData
),
RecursiveCte1 (RowNum, Uniform, RowVal, MinVal, SomeSum, RowNumCalc, RecursiveLevel) AS (
  SELECT * FROM RecursiveCte0
  UNION ALL
  SELECT
    RowNum, Uniform, RowVal,
    MIN(MinVal) OVER (PARTITION BY Uniform),
    SUM(RowNum) OVER (PARTITION BY Uniform),
    ROW_NUMBER() OVER (PARTITION BY Uniform ORDER BY RowNum),
    RecursiveLevel + 1
  FROM RecursiveCte0
)
SELECT *
FROM RecursiveCte1
ORDER BY RecursiveLevel, RowNum;

Results:

RowNum Uniform RowVal MinVal SomeSum RowNumCalc RecursiveLevel
1      A       A      A      1       1          0
2      A       B      B      2       2          0
3      A       C      C      3       3          0
4      A       D      D      4       4          0
1      A       A      A      10      1          1
2      A       B      A      10      2          1
3      A       C      A      10      3          1
4      A       D      A      10      4          1

As expected, the MIN, SUM, and ROW_NUMBER functions generate the appropriate values based on all rows from RecursiveCte0. I would expect the following recursive query to be logically identical to the non-recursive version above, but it produces different results:

WITH SourceData (RowNum, Uniform, RowVal) AS (
  SELECT 1, 'A', 'A' UNION ALL
  SELECT 2, 'A', 'B' UNION ALL
  SELECT 3, 'A', 'C' UNION ALL
  SELECT 4, 'A', 'D'
),
RecursiveCte (RowNum, Uniform, RowVal, MinVal, SomeSum, RowNumCalc, RecursiveLevel) AS (
  SELECT RowNum, Uniform, RowVal, RowVal, RowNum, CAST(RowNum AS BIGINT), 0
  FROM SourceData
  UNION ALL
  SELECT
    RowNum, Uniform, RowVal,
    MIN(MinVal) OVER (PARTITION BY Uniform),
    SUM(RowNum) OVER (PARTITION BY Uniform),
    ROW_NUMBER() OVER (PARTITION BY Uniform ORDER BY RowNum),
    RecursiveLevel + 1
  FROM RecursiveCte
  WHERE RecursiveLevel < 1
)
SELECT *
FROM RecursiveCte
ORDER BY RecursiveLevel, RowNum;

Results:

RowNum Uniform RowVal MinVal SomeSum RowNumCalc RecursiveLevel
1      A       A      A      1       1          0
2      A       B      B      2       2          0
3      A       C      C      3       3          0
4      A       D      D      4       4          0
1      A       A      A      1       1          1
2      A       B      B      2       1          1
3      A       C      C      3       1          1
4      A       D      D      4       1          1

For each of the three analytic functions, it appears that the grouping is only being applied within the context of each individual row, rather than all of the rows at that level. This unexpected behavior also happens if I partition over (SELECT NULL). I would expect the analytic functions to apply to the entire recursion level, as per MSDN:

Analytic and aggregate functions in the recursive part of the CTE are applied to the set for the current recursion level and not to the set for the CTE. Functions like ROW_NUMBER operate only on the subset of data passed to them by the current recursion level and not the entire set of data passed to the recursive part of the CTE.

Why do these two queries produce different results? Is there a way to effectively use analytic functions with recursive common table expressions?

novog
  • 121
  • 11
  • No, you cannot do this in the way you would like. You are misreading MSDN, it is specifically excluding your case. This is because the rCTE is processed in a stack fashion: each subset is processed row by row (much more similar to Oracle's `CONNECT BY` than to the written SQL), therefore the row number only applies to that row with its immediate join. – Charlieface Mar 30 '21 at 15:34
  • Does this answer your question? [Why do Recursive CTEs run analytic functions (ROW\_NUMBER) procedurally?](https://stackoverflow.com/questions/9965317/why-do-recursive-ctes-run-analytic-functions-row-number-procedurally) – Charlieface Mar 30 '21 at 15:34
  • @Charlieface Thanks; the existing question is related, but not quite the same, I think. It assumed that the analytic function should apply to the entire recursive part of the rCTE. MSDN and the answer both explain that the aggregate/analytic functions apply only to the current recursion level, as I had expected but am not seeing. Neither states that the function's scope is restricted to individual rows. – novog Mar 30 '21 at 16:04
  • True, that answer doesn't actually answer the question at all, but the *question* is the same if you look carefully. See also https://explainextended.com/2009/11/18/sql-server-are-the-recursive-ctes-really-set-based/ which is linked over there – Charlieface Mar 30 '21 at 16:08
  • I just found another [question](https://stackoverflow.com/questions/39445320/row-number-is-not-working-in-cte) whose answer agrees on the row-by-row explanation, and also points to the MSDN documentation. However, I still don't think that MSDN says this; it says "on the subset of data passed to them by the current recursion level", nothing about row-by-row.. – novog Mar 30 '21 at 16:09
  • I think it's just typical bad wording in MSDN. Either way, this is how it works, and how it has always worked. It is not a bug. By the way, considering the poor performance of rCTE, plus your problem, you are probably better off using a table variable and a `while` loop, horrible as it looks – Charlieface Mar 30 '21 at 16:11

0 Answers0