Got to do some tests wrapping things in functions and using different scenarios to fill a table.
Options
Mikael's proposed solution can be improved by removing the WHERE clause WHERE N <= @count
and adding a TOP clause TOP (@count)
. That reduces it's execution time almost six-fold for me in the following:
This is the fastest algorithm! Now you know it :o)
create function dbo.Numbers2(@count bigint)
RETURNS TABLE RETURN
--===== Itzik's CROSS JOINED CTE method
WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
E02(N) AS (SELECT 1 FROM E00 a, E00 b),
E04(N) AS (SELECT 1 FROM E02 a, E02 b),
E08(N) AS (SELECT 1 FROM E04 a, E04 b),
E16(N) AS (SELECT 1 FROM E08 a, E08 b),
E32(N) AS (SELECT 1 FROM E16 a, E16 b),
cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
SELECT TOP (@count) N
FROM cteTally
Compared to the slower adaptation:
create function dbo.Numbers3(@count bigint)
RETURNS TABLE RETURN
--===== Itzik's CROSS JOINED CTE method
WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
E02(N) AS (SELECT 1 FROM E00 a, E00 b),
E04(N) AS (SELECT 1 FROM E02 a, E02 b),
E08(N) AS (SELECT 1 FROM E04 a, E04 b),
E16(N) AS (SELECT 1 FROM E08 a, E08 b),
E32(N) AS (SELECT 1 FROM E16 a, E16 b),
cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
SELECT N
FROM cteTally
WHERE N <= @count
The solution proposed in https://stackoverflow.com/a/14386663/481812 has similar performance to Numbers2:
create function dbo.Numbers(@count bigint)
RETURNS TABLE RETURN
with byte (n) as (
-- shortened, see the other answer for full code
Tests
- Inserting into a temp heap is fastest (600),
- inserting into temp table with PK clustered index slower (3000) and finally
- filling a physical table with index the slowest (10000). Additional comparison is made with
- filling a temp table with identity PK, as Gregory suggested, from the same CTE style rows generation is pretty much the same (3000) as clustered temp #2 in performance.
- Gregory's proposal was too slow to consider (21000), though. To amaze you,
- the original example would not be realistic to measure with a million rows; 10000 was used instead, which is 100 times less than in the other tests, to bring it down to reasonable time (15000). Assuming it would scale proportionally - best case scenario - it should theoretically complete 1m rows in 25 minutes (1500000).
Ben's suggestion (tried) would probably make a difference if data were not sorted - here we're inserting sorted data, hence always appending, no performance gain building the index after.
Test group 1 - temp heap inserts
CREATE TABLE #Numbers ([Number] [int])
insert #Numbers select n from dbo.Numbers(1000000)
drop table #Numbers
-- 480 - 900 ms, avg 600 ms
CREATE TABLE #Numbers ([Number] [int])
insert #Numbers select n from dbo.Numbers2(1000000)
drop table #Numbers
-- 440 - 800 ms, avg 550 ms
-- just in case you wondered how it scales, 30 times more went in in 14000,
-- Numbers and Numbers2 scale linearly
CREATE TABLE #Numbers ([Number] [int])
insert #Numbers select n from dbo.Numbers3(1000000)
drop table #Numbers
-- 2700 - 4000 ms, avg 3200 ms
Test group 2 - inserting in temp cluster
CREATE TABLE #Numbers ([Number] [int] primary key)
insert #Numbers select n from dbo.Numbers(1000000)
drop table #Numbers
-- 1900 - 4000 ms, avg 3000 ms
CREATE TABLE #Numbers ([Number] [int] primary key)
insert #Numbers select n from dbo.Numbers2(1000000)
drop table #Numbers
-- 1900 - 4000 ms, avg 3000 ms
CREATE TABLE #Numbers ([Number] [int] primary key)
insert #Numbers select n from dbo.Numbers3(1000000)
drop table #Numbers
-- 4000 - 7000 ms, avg 5000 ms
Test group 3 - inserting in physical table cluster
CREATE TABLE PNumbers ([Number] [int] primary key)
insert PNumbers select n from dbo.Numbers(1000000)
drop table PNumbers
-- 7000 - 12000 ms, avg 10000 ms
CREATE TABLE PNumbers ([Number] [int] primary key)
insert PNumbers select n from dbo.Numbers2(1000000)
drop table PNumbers
-- 7000 - 12000 ms, avg 10000 ms
CREATE TABLE PNumbers ([Number] [int] primary key)
insert PNumbers select n from dbo.Numbers3(1000000)
drop table PNumbers
-- 7000 - 12000 ms, avg 10000 ms
Test 4 - filling temp clustered identity column from rows generated
CREATE TABLE #Numbers ([Number] [int] identity primary key, x bit)
INSERT INTO #Numbers(x) select 1 from dbo.Numbers2(1000000)
drop table #Numbers
-- 2000 - 4000 ms, avg 3000 ms
Test 5 - filling temp clustered identity column in a loop
CREATE TABLE #Numbers ([Number] [int] identity primary key)
declare @cnt int = 0
while (@cnt<1000000)
BEGIN
INSERT INTO #Numbers DEFAULT values
set @cnt = @cnt+1
END
drop table #Numbers
-- 20000 - 22000 ms, avg 21000 ms
Conclusions
The Numbers(n) and Numbers2(n) options perform consistently the same (best). Numbers3(n) is a way slower due to the WHERE clause. Other tested scenarios are too slow to consider.
Inserting into an unindexed temp heap is fastest. As soon as you add any index (tried) to the column (either PK clustered or secondary non unique), it's not about the number generation anymore but rather about index inserts (and possibly IO). The worst is inserting into a physical table. This must be because temp tables are not necessarily written to disk (tempdb).
Proposal
According to measurements, you'll gain by
- using CTE + JOIN + ROW_NUMBER + TOP based solution as opposed to loops or recursive CTE,
- not using an index - using a heap, and
- by using a temporary table rather than physical table.
You may avoid the physical or temp table altogether and just use the function, but then in your subsequent queries you'll be chancing the optimizer as you'll be flying without statistics on the numbers. You may then solve this with query hints which is not nice, but works... Though, looking at the execution plan, row numbers at least for dbo.Numbers are estimated correctly.