1

I'm trying to insert 100,000,000 simple coordinate based records into a table. Is there a faster way to make this happen than the t-sql command below

declare @x int
declare @y int
set @x = 0
set @y = 0

begin tran
while @x < 10000
begin
    while @y < 10000
    begin
        insert into world (x,y) VALUES (@x,@y)
        set @y = @y + 1
    end
    set @y = 0
    set @x = @x + 1
end
commit tran
Pondlife
  • 15,992
  • 6
  • 37
  • 51
mxmissile
  • 11,464
  • 3
  • 53
  • 79
  • 2
    DO you have those coordinates available in an external file? You could bulkload that into a staging table.... – marc_s Apr 30 '13 at 17:11
  • I do not, but I suspect creating a file would take as long as the query above. Time to experiment. – mxmissile Apr 30 '13 at 17:21
  • Rather than 100,000,000 individual INSERTs, you probably would be better off with either INSERT INTO SELECT syntax or SELECT INTO syntax. Of those, I would guess SELECT INTO might perform better. See [here](http://www.blackwasp.co.uk/SQLSelectInsert.aspx), for example. – mbeckish Apr 30 '13 at 17:28
  • I see you have and accepted answer. But another option might be insert 10000 rows into a #temp and do a cross join. – paparazzo Apr 30 '13 at 19:34
  • I have to say that the answer by @StingyJack appears to be a faster solution than mine, even when inserting the final result into a permanent table rather than #values. If this is something you do regularly, you may want to experiment a little bit with that. –  Apr 30 '13 at 22:21
  • 1
    I'm going to, yours worked fine, it was a one off thing. I'm still interested in his solution, will get some time this weekend for my curiosity run some benchmarks. – mxmissile May 01 '13 at 15:44

4 Answers4

3

If you have Number table, please use your own Number table. Otherwise, you can use spt_values like below.

WITH base_num AS
(SELECT number FROM master..spt_values WHERE type = 'P' AND number < 100)
, num AS 
(SELECT b1.number * 100 + b2.number AS number
 FROM base_num b1
 CROSS JOIN base_num b2
) 
INSERT INTO world (x,y)
SELECT n1.number AS x, n2.number AS y 
FROM num n1
CROSS JOIN num n2
EricZ
  • 6,065
  • 1
  • 30
  • 30
  • spt_values is undocumented and could give unexpected results. In this case it may not produce enough to generate the records required. – StingyJack Apr 30 '13 at 17:28
  • @StingyJack, as I mentioned in my answer, if OP have own number table, then should use it. Also, could you please give a example that "spt_values could give unexpected results". I would really interest to know it. – EricZ Apr 30 '13 at 17:34
  • Its considered an "internal" and thus not documented. If there is no documentation, then you are inferring the behavior and it could do something unexpected. http://stackoverflow.com/a/4280038/16391 – StingyJack Apr 30 '13 at 17:35
  • 1
    @StingyJack, if I read the link correctly, the answer says "Why we should use spr_values", not opposite. – EricZ Apr 30 '13 at 17:47
  • The title of the link is unimportant. The point was that you are recommending an undocumented method that may produce unexpected or non-guarantee-able results, and that only lists about 2500 records each unless you do something else. 2500 elements squared is only going to get about 6MM rows, and OP needs 100MM. – StingyJack Apr 30 '13 at 19:21
2

Something like this might work if 100 mill rows isn't too big a mouthful for the server to do in one piece - it obviously depends on the size and speed of your log-file.

WITH counter AS 
(SELECT TOP 10000 ROW_NUMBER() 
   OVER (ORDER BY a.[object_id], a.name, b.[object_id]) AS rownum 
   FROM sys.columns a, sys.columns b) 
INSERT INTO World (x,y) 
SELECT a.rownum, b.rownum 
  FROM counter a, counter b
2

You can use a numbers table

 SELECT TOP 10000 -- use a smaller value for testing, this will take a bit
        IDENTITY(INT,1,1) as N
   INTO #Numbers
   FROM Master.dbo.SysColumns sc1,
        Master.dbo.SysColumns sc2

Then something like

SELECT
    n1.N as 'N1'
    , n2.N as 'N2'
INTO #values
FROM #Numbers n1
    CROSS JOIN #Numbers n2


SELECT COUNT(*) FROM #values

To get the cartesian of the sets, provided you can use simplified coordinate values.

StingyJack
  • 19,041
  • 10
  • 63
  • 122
  • 1
    It took about 12 seconds to generate a million numbers. I only have a shared VM SQL Server to use, so I dont dare try 100MM. – StingyJack Apr 30 '13 at 17:37
2

If you don't have a numbers table, you can simulate one with a CTE:

with cte as
(select 1 i union all select i+1 i from cte where i < 10000)
INSERT into World (x,y) 
SELECT x.i, y.i 
from cte x cross join cte y
option (maxrecursion 0)