2

Long story short; I am testing a system to purge entries from a table over a network connection, and the functionality is predicted to handle over 2 billion entries at most.

I need to stress test this to be certain.

Here's my test script (At best it's able to generate 9.8 million in ten minutes.)

DECLARE @I INT=0

WHILE @I <2000000001
BEGIN
    INSERT INTO "Table here"
    VALUES(@I)  

    SET @I=@I+1
END

Can anyone suggest anything, or give me an idea what the upper limits of my test environment might be in this situation?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Julie Moss
  • 21
  • 2
  • "I am testing a system to purge entries from a table over a network connection, and the functionality is predicted to handle over 2 billion entries at most." - sounds like a bad test. What is the purge procedure? – Dai Feb 13 '18 at 21:47
  • 1
    If you *really* need a table that big, I might recommend `BULK INSERT` from a 2-billion-row file on disk as that would be faster - though the file would necessarily be at least 16GB+ in size (because if you have over 2 billion rows with an `IDENTITY` column or primary-key then you'll need to use `bigint` which is 8 bytes long, and `8 * 2^31 rows == 17,179,869,184 bytes`). – Dai Feb 13 '18 at 21:47
  • Maybe this question can help you fill the table faster https://stackoverflow.com/questions/21425546/how-to-generate-a-range-of-numbers-between-two-numbers – jontro Feb 13 '18 at 21:51
  • It's an extreme test, and I know it. It's for long term regression, and scheduled purging of transaction logs. There have been cases of the log tables reaching near a billion, and they way the function is built; it should be able to hand 2 billion. – Julie Moss Feb 13 '18 at 21:51
  • 1
    INSERT INTO T SELECT * FROM T will double your table size - it seems to be pretty quick – SQL Hacks Feb 13 '18 at 21:55
  • In that syntax it reads it as me attempting to create duplicates, and immediately ends the query. – Julie Moss Feb 13 '18 at 23:01

1 Answers1

7

Below is a method using CROSS JOIN in batches of 10M. This loaded 2 billion rows in about 6 minutes on my desktop machine.

--create and load 2B row table using cross join CTE source
CREATE TABLE dbo.TableHere(
    ID int NOT NULL
);

DECLARE
      @TargetRowCount int = 2000000000
    , @RowsInserted int = 0;

WHILE @RowsInserted < @TargetRowCount
BEGIN

    WITH 
         t10 AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) t(n))
        ,t1k AS (SELECT 0 AS n FROM t10 AS a CROSS JOIN t10 AS b CROSS JOIN t10 AS c)
        ,t10m AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS num FROM t1k AS a
        CROSS JOIN t1k AS b CROSS JOIN t10 AS c)
    INSERT INTO dbo.TableHere WITH(TABLOCKX)
        SELECT num + @RowsInserted
        FROM t10m;

    SET @RowsInserted += @@ROWCOUNT;

    DECLARE @CurrentTimestampString varchar(1000) = FORMAT(SYSDATETIME(),'yyyy-MM-dd HH:mm:ss');
    RAISERROR('%s: %d of %d rows inserted', 0, 0, @CurrentTimestampString, @RowsInserted, @TargetRowCount) WITH NOWAIT;

END;
GO

Consider creating a permanent tally table with incremental values. Not only will this improve performance of test data generation, a materialized tally table can be reused to facilitate a variety of other common tasks, such as generating incremental value ranges, datetime sequences, etc.

Leveraging a tally table improved performance by 30% (4.5 minutes) on my machine by avoiding repeating the cross join every iteration. EDIT: Down to 2.5 minutes after processor upgrade (Intel i9-12900K processor).

SET NOCOUNT ON;
--create and load 10M row tally table
DROP TABLE IF EXISTS dbo.Tally;
CREATE TABLE dbo.Tally(
     Num int NOT NULL CONSTRAINT PK_Tally PRIMARY KEY
);
WITH 
     t10 AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) t(n))
    ,t1k AS (SELECT 0 AS n FROM t10 AS a CROSS JOIN t10 AS b CROSS JOIN t10 AS c)
    ,t10m AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS num FROM t1k AS a
    CROSS JOIN t1k AS b CROSS JOIN t10 AS c)
INSERT INTO dbo.Tally WITH(TABLOCKX) (Num) 
    SELECT num 
    FROM t10m;
ALTER INDEX PK_Tally ON dbo.Tally REBUILD WITH (FILLFACTOR=100);
GO

--create and load 2B row table using tally table source
CREATE TABLE dbo.TableHere(
    ID int NOT NULL
);

DECLARE
      @TargetRowCount int = 2000000000
    , @RowsInserted int = 0;

WHILE @RowsInserted < @TargetRowCount
BEGIN

    INSERT INTO dbo.TableHere WITH(TABLOCKX) (ID)
        SELECT Num + @RowsInserted
        FROM dbo.Tally;

    SET @RowsInserted += @@ROWCOUNT;

    DECLARE @CurrentTimestampString varchar(1000) = FORMAT(SYSDATETIME(),'yyyy-MM-dd HH:mm:ss');
    RAISERROR('%s: %d of %d rows inserted', 0, 0, @CurrentTimestampString, @RowsInserted, @TargetRowCount) WITH NOWAIT;

END;
GO
Dan Guzman
  • 43,250
  • 3
  • 46
  • 71