Table with million rows, two columns.
code | name
xyz | product1
abc | Product 2
...
...
I want to do insert in small batches (10000) via the insert into/select query.
How can we do this when there is no identity key to create a batch?
Table with million rows, two columns.
code | name
xyz | product1
abc | Product 2
...
...
I want to do insert in small batches (10000) via the insert into/select query.
How can we do this when there is no identity key to create a batch?
You could use a LEFT OUTER JOIN
in your SELECT
statement to identify records that are not already in the INSERT
table, then use TOP
to grab the first 10000 that the database finds. Something like:
INSERT INTO tableA
SELECT TOP 10000 code, name
FROM tableB LEFT OUTER JOIN tableA ON tableB.Code = tableA.Code
WHERE tableA.Code IS NULL;
And then run that over and over and over again until it's full.
You could also use Windowing functions to batch like:
INSERT INTO tableA
SELECT code, name
FROM (
SELECT code, name, ROW_NUMBER() OVER (ORDER BY name) as rownum
FROM tableB
)
WHERE rownum BETWEEN 1 AND 100000;
And then just keep changing the BETWEEN
to get your batch. Personally, if I had to do this, I would use the first method though since it's guaranteed to catch everything that isn't already in TableA
.
Also, if there is the possibility that tableb
will gain records during this batching process, then option 1 is definitely better. Essentially, with option2, it will determine the row_number() on the fly, so newly inserted records will cause records to be missed if they show up in the middle of batches.
If TableB
is static, then Option 2 may be faster since the DB just has to sort and number the records, instead of having to join HUGE table to HUGE table and then grab 10000 records.
You can do the pagination on SELECT and select the records by batch/page size of say 10000 or whatever you need and insert in the target table. In the below sample you will have to change the value of @Min and @Max for each iteration of the batch size you desire to have.
INSERT INTO EmployeeNew
SELECT Name
FROM
(
SELECT DENSE_RANK OVER(ORDER BY EmployeeId) As Rank, Name
FROM Employee
) AS RankedEmployee
WHERE Rank >= @Min AND Rank < @Max