Tl;DR
I get different results when using a CTE as compared to using temp tables. Why?
Problem
Suppose I have the following data structures:
#numbers
A simple number table from 1 to 10.
DROP TABLE IF EXISTS #numbers;
WITH rcte(n) AS (
SELECT 1 AS n
UNION ALL
SELECT n + 1 AS n
FROM rcte
WHERE n < 10
)
SELECT n
INTO #numbers
FROM rcte;
SELECT TOP 3 n
FROM #numbers;
n |
---|
1 |
2 |
3 |
#max
I can now use #numbers
to create a table with 3 rows and columns id_col
and max_nr
, a random number between 2 and 5:
DROP TABLE IF EXISTS #max;
DECLARE @min TINYINT = 2
, @max TINYINT = 5;
SELECT CONCAT(N'id_', n) AS id_col
, ABS(CHECKSUM(NEWID())) % (@max - @min + 1) + @min AS max_nr
INTO #max
FROM #numbers
WHERE n <= 3;
SELECT *
FROM #max; --- max_nr will of course look different each run
id_col | max_nr |
---|---|
id_1 | 5 |
id_2 | 2 |
id_3 | 3 |
#result
Now I can easily generate the table I need, namely a table where for each id_col
I have each number up to max_nr
DROP TABLE IF EXISTS #result;
SELECT id_col
, n
FROM #max
INNER JOIN #numbers
ON n <= max_nr;
id_col | n |
---|---|
id_1 | 1 |
id_1 | 2 |
id_1 | 3 |
id_1 | 4 |
id_1 | 5 |
id_2 | 1 |
id_2 | 2 |
id_3 | 1 |
id_3 | 2 |
id_3 | 3 |
CTE
If I try to do the same with A CTE
I get sometimes (depends on the random number) strange results. My guess is that the random part is recalculated for each row in the join leading to different results, if this is the case, is the only solution to use a temp table or any chance to solve that with a CTE
?
N.B. Re run the following code a couple of times and you will eventually see that there are "holes" i.e. missing numbers:
WITH cte_max (id_col, max_nr) AS (
SELECT CONCAT(N'id_', n) AS id_col
, ABS(CHECKSUM(NEWID())) % (@max - @min + 1) + @min AS max_nr
FROM #numbers
WHERE n <= 3
)
SELECT id_col
, n
FROM cte_max
INNER JOIN #numbers
ON n <= max_nr;
For instance I got this result in one run:
id_col | n | comment |
---|---|---|
id_1 | 1 | |
id_1 | 2 | |
id_1 | 4 | 3 missing!! |
id_1 | 5 | |
id_2 | 1 | |
id_2 | 2 | |
id_2 | 3 | |
id_3 | 1 | |
id_3 | 2 |
You see that the three is missing. This won't happen for all the runs (sometimes all numbers are in order, but that they are missing from time to time, I clearly see that I am overlooking something)
Question
Can somebody explain to me, why this behaviour is to be expected? Has it something to do how SQL Server parallelizes its tasks and in the inner join the random number is yet not materialized and gets re-calculated for each join result? If so, is there a work around beside using temp tables?