SQL CTE for a numbers table is fast for static value, but slow for table value

Question

I'm trying to output the values of a table multiple times, based on a column in that table.

I tried to use CTE to make a numbers table on the fly:

WITH cte AS
(
    SELECT
        ROW_NUMBER() OVER (ORDER BY (select 0)) AS i
    FROM
        sys.columns c1 CROSS JOIN sys.columns c2 CROSS JOIN sys.columns c3
)
select *
from myTable, cte
WHERE i <= myTable.timesToRepeatColumn
  and myTable.id = '209386'

This SQL seems to take forever to run, so it seems to be trying to run the entire CTE before joining.

If I replace myTable.timesToRepeatColumn with a static value (say 10000), the query returns virtually instantly. So it seems to be doing the where i <= before fully cross-joining the CTE's table.

How can I tell SQL to do the where statement first like it does with a static number?

BTW `myTable.timesToRepeatColumn` has a value of 3 for my test row so it should really should be near-instant to run. — Brian Heward, Oct 21 '20 at 21:15

score 1 · Accepted Answer · answered Oct 21 '20 at 21:20

1

you can use recursive cte to achieve your goal

WITH cte AS (
SELECT
  *
, timesToRepeatColumn as level
FROM
    myTablewhere 
WHERE myTable.id = '209386'
UNION ALL
SELECT
  *
, level -1  as level
FROM
    cte 
WHERE 
    level > 0
)


SELECT * FROM cte

answered Oct 21 '20 at 21:20

eshirvana

23,227
3
22
38

I have to specify the columns here not use * or I get an error `All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.` Also this includes the row where level is 0 for some reason, but I can change the final where clause to level > 1 to fix that – Brian Heward Oct 21 '20 at 21:29
@BrianHeward you got the idea ;) – eshirvana Oct 21 '20 at 21:33
Note that recursive CTEs can also give a performance hit as they are loops e.g., see comments in https://stackoverflow.com/questions/64472923/how-do-i-get-the-last-12-months-date-of-today . If you can do it via sets logic it will typically give better performance. Of course, in cases where numbers of iterations are small though (which I think is the case here), the performance hit is negligable (e.g., the difference between 15 milliseconds and 10 milliseconds... doesn't make a difference unless you're running it very very frequently) – seanb Oct 21 '20 at 23:45

score 0 · Answer 2 · answered Oct 21 '20 at 21:21

0

CTEs in SQL Server are not necessarily run 'independently'. SQL (in SQL Server, etc) is declarative, which means you tell it what you want, not how to do it.

It the query optimiser determines that it can do it better by doing something differently, it will.

A good example is

IF EXISTS(SELECT * FROM test) PRINT 'X';
IF (SELECT COUNT(*) FROM test) > 0 PRINT 'Y';
IF (SELECT COUNT(*) FROM test) > 1 PRINT 'Z';

If it was doing what you told it do, the query plans for the second and third would basically be the same. However, when you run it, the query plans for the first and second are the same; the third differs.

When you hard-code the value (e.g., 10,000), the query optimiser can use that hardcoded value to determine what to do. In this case, it probably determines it doesn't need to run the full CTE, just run it until you get 10,000 rows.

However, if you use a value that can vary (e.g., myTable.timesToRepeatColumn), then the query optimiser often makes a query plan that would word for any value. As such, it makes a query plan that is not fantastic for your situation - probably creating the full CTE in memory before using it. If sys.columns has 100 rows, that's 100^3 rows it creates. If it's 1000, it's 1000^3 e.g., 1,000,000,000. Likely you have more than 1000 rows.

answered Oct 21 '20 at 21:21

seanb

6,272
2
4
22

Yep, that makes sense for why it's doing the wrong thing.... but how can I get it to use the better query plan where it doesn't need to run the full cte? – Brian Heward Oct 21 '20 at 21:23
You could use a faster method of creating a numbers table - e.g., see https://stackoverflow.com/questions/1393951/what-is-the-best-way-to-create-and-populate-a-numbers-table or the ones I usually use https://stackoverflow.com/questions/64383302/need-to-add-3-months-to-each-value-within-a-column-based-on-the-1st-3-months/64462525#64462525 (which is fast but maybe not as fast as some of the others). If it's in a stored procedure/script (not a view) you could also try first creating a temporary numbers table with a PK e.g., `CREATE TABLE #nums (id int PRIMARY KEY)` and then populating/using that. – seanb Oct 21 '20 at 21:40
Alternatively, in your CTE - you could try firstly limiting the number of rows to a maximum (e.g., if you don't need more than 1000, put `SELECT TOP 1000` at the front. Also try to generate the numbers without a sort e.g., by using an auto-incrementing field - it has a chance of helping. – seanb Oct 21 '20 at 21:42
Err the system I'm using here is effectively the same as method 6 from that linked post and method 7 isn't a huge speed increase over 6. The problem isn't how fast the numbers table is generated. The problem is knowing when to stop generating numbers. My code takes less than a second when I specify a hardcoded maximum. The problem is I don't know when I'm writing the code what that maximum should be. – Brian Heward Oct 21 '20 at 21:48
@Brian - try putting the TOP part in e.g., `SELECT TOP 10000 ROW_NUMBER() ...` – seanb Oct 21 '20 at 22:20

SQL CTE for a numbers table is fast for static value, but slow for table value

2 Answers2