Efficient way to generate 2 billion rows in SQL Server 2014 Developer

Question

Long story short; I am testing a system to purge entries from a table over a network connection, and the functionality is predicted to handle over 2 billion entries at most.

I need to stress test this to be certain.

Here's my test script (At best it's able to generate 9.8 million in ten minutes.)

DECLARE @I INT=0

WHILE @I <2000000001
BEGIN
    INSERT INTO "Table here"
    VALUES(@I)  

    SET @I=@I+1
END

Can anyone suggest anything, or give me an idea what the upper limits of my test environment might be in this situation?

"I am testing a system to purge entries from a table over a network connection, and the functionality is predicted to handle over 2 billion entries at most." - sounds like a bad test. What is the purge procedure? — Dai, Feb 13 '18 at 21:47
If you *really* need a table that big, I might recommend `BULK INSERT` from a 2-billion-row file on disk as that would be faster - though the file would necessarily be at least 16GB+ in size (because if you have over 2 billion rows with an `IDENTITY` column or primary-key then you'll need to use `bigint` which is 8 bytes long, and `8 * 2^31 rows == 17,179,869,184 bytes`). — Dai, Feb 13 '18 at 21:47
Maybe this question can help you fill the table faster https://stackoverflow.com/questions/21425546/how-to-generate-a-range-of-numbers-between-two-numbers — jontro, Feb 13 '18 at 21:51
It's an extreme test, and I know it. It's for long term regression, and scheduled purging of transaction logs. There have been cases of the log tables reaching near a billion, and they way the function is built; it should be able to hand 2 billion. — Julie Moss, Feb 13 '18 at 21:51
INSERT INTO T SELECT * FROM T will double your table size - it seems to be pretty quick — SQL Hacks, Feb 13 '18 at 21:55
In that syntax it reads it as me attempting to create duplicates, and immediately ends the query. — Julie Moss, Feb 13 '18 at 23:01

Dan Guzman · Answer 1 · 2022-10-27T09:36:45.803

Below is a method using CROSS JOIN in batches of 10M. This loaded 2 billion rows in about 6 minutes on my desktop machine.

--create and load 2B row table using cross join CTE source
CREATE TABLE dbo.TableHere(
    ID int NOT NULL
);

DECLARE
      @TargetRowCount int = 2000000000
    , @RowsInserted int = 0;

WHILE @RowsInserted < @TargetRowCount
BEGIN

    WITH 
         t10 AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) t(n))
        ,t1k AS (SELECT 0 AS n FROM t10 AS a CROSS JOIN t10 AS b CROSS JOIN t10 AS c)
        ,t10m AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS num FROM t1k AS a
        CROSS JOIN t1k AS b CROSS JOIN t10 AS c)
    INSERT INTO dbo.TableHere WITH(TABLOCKX)
        SELECT num + @RowsInserted
        FROM t10m;

    SET @RowsInserted += @@ROWCOUNT;

    DECLARE @CurrentTimestampString varchar(1000) = FORMAT(SYSDATETIME(),'yyyy-MM-dd HH:mm:ss');
    RAISERROR('%s: %d of %d rows inserted', 0, 0, @CurrentTimestampString, @RowsInserted, @TargetRowCount) WITH NOWAIT;

END;
GO

Consider creating a permanent tally table with incremental values. Not only will this improve performance of test data generation, a materialized tally table can be reused to facilitate a variety of other common tasks, such as generating incremental value ranges, datetime sequences, etc.

Leveraging a tally table improved performance by 30% (4.5 minutes) on my machine by avoiding repeating the cross join every iteration. EDIT: Down to 2.5 minutes after processor upgrade (Intel i9-12900K processor).

SET NOCOUNT ON;
--create and load 10M row tally table
DROP TABLE IF EXISTS dbo.Tally;
CREATE TABLE dbo.Tally(
     Num int NOT NULL CONSTRAINT PK_Tally PRIMARY KEY
);
WITH 
     t10 AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) t(n))
    ,t1k AS (SELECT 0 AS n FROM t10 AS a CROSS JOIN t10 AS b CROSS JOIN t10 AS c)
    ,t10m AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS num FROM t1k AS a
    CROSS JOIN t1k AS b CROSS JOIN t10 AS c)
INSERT INTO dbo.Tally WITH(TABLOCKX) (Num) 
    SELECT num 
    FROM t10m;
ALTER INDEX PK_Tally ON dbo.Tally REBUILD WITH (FILLFACTOR=100);
GO

--create and load 2B row table using tally table source
CREATE TABLE dbo.TableHere(
    ID int NOT NULL
);

DECLARE
      @TargetRowCount int = 2000000000
    , @RowsInserted int = 0;

WHILE @RowsInserted < @TargetRowCount
BEGIN

    INSERT INTO dbo.TableHere WITH(TABLOCKX) (ID)
        SELECT Num + @RowsInserted
        FROM dbo.Tally;

    SET @RowsInserted += @@ROWCOUNT;

    DECLARE @CurrentTimestampString varchar(1000) = FORMAT(SYSDATETIME(),'yyyy-MM-dd HH:mm:ss');
    RAISERROR('%s: %d of %d rows inserted', 0, 0, @CurrentTimestampString, @RowsInserted, @TargetRowCount) WITH NOWAIT;

END;
GO

This is phenomenal! Thank you! – Julie Moss Mar 16 '18 at 13:25 — Julie Moss, Mar 16 '18 at 13:25

Efficient way to generate 2 billion rows in SQL Server 2014 Developer

1 Answers1

Linked