0

I have two tables. They both have identical structures except table2 has an additional column. I currently copy data from table1 into table2 using a stored proc, as shown below.

However, due to the sheer number of records (20million+), and the structure of the stored proc, this currently takes a couple of hours to run.

Does anyone have any suggestions on how to optimize the code?

CREATE PROCEDURE dbo.insert_period @period INT AS

DECLARE @batchsize INT
DECLARE @start INT
DECLARE @numberofrows INT

SELECT @numberofrows = COUNT(*) from daily_table

SET @batchsize = 150000
SET @start = 1

WHILE @start < @numberofrows
BEGIN
    INSERT INTO dbo.main_table WITH (TABLOCK) (
    col1,
    col2,
    ....,
    col26,
    time_period
    )
    SELECT *, @period FROM dbo.daily_table
    ORDER BY id

    OFFSET @start ROWS
            FETCH NEXT @batchsize ROWS ONLY

    SET @start += @batchsize + 1
END

The id that I am using here is not unique. The table itself does not have any keys or unique id's.

Chris Albert
  • 2,462
  • 8
  • 27
  • 31

2 Answers2

0

First I would like to point out that the logic in your insert is flawed.

With @start starting at 1 your always skipping the first row of the source table. Then adding 1 to it at the end of your loop is causing it to skip another row on each subsequent run of the loop.

If your set on using batched inserts I suggest you read up on how it works over on MSSQLTips.

To help you with performance I would suggest taking a look at the following:

SELECT *

Remove the SELECT * and replace with the column names. This will help the optimizer get you a better query plan. Further reading on why SELECT * is bad can be found in this SO Question.

ORDER BY

That ORDER BY is probably slowing you down. Without seeing your query plan we cannot know for sure though. Each time your loop executes it queries the source table and has to sort all those records. Sorting 20+ milling records that many times is a lot of work. Take a look at my simplified example below.

CREATE TABLE #Test (Id INT);
INSERT INTO #Test VALUES (1), (2), (3), (4), (5);

DECLARE @batchsize INT;
DECLARE @start INT;
DECLARE @numberofrows INT;

SELECT  @numberofrows = COUNT(*) FROM   #Test;

SET @batchsize = 2;
SET @start = 0;

WHILE @start < @numberofrows
BEGIN
    SELECT
        *
        , 10
    FROM
        #Test
    ORDER BY
        Id OFFSET @start ROWS FETCH NEXT @batchsize ROWS ONLY;

    SET @start += @batchsize;
END;

Below is a portion of the query plan produced by the sample. Notice the Sort operation highlighted in yellow. Its cost accounts for 78% of that query plan.

enter image description here

If we add an index that is already sorted on the Id column of the source table we can eliminate the sort. Now when the loop runs it doesn't have to do any sorting.

CREATE INDEX ix_Test ON #Test (Id)

enter image description here

Other Options to Research

  1. Columnstore Indexes
  2. Batch Mode in RowStore
  3. Parallel Inserts
Chris Albert
  • 2,462
  • 8
  • 27
  • 31
-1

You copy the table row by row, that's why it takes so long. The simplest way to achieve what you want is an 'INSERT' combined with a 'SELECT' statement. This way, you would insert the data in one batch.

CREATE TABLE dbo.daily_table (id INT PRIMARY KEY IDENTITY,
                              value1 NVARCHAR(100) NULL,
                              value2 NVARCHAR(100) NULL);
GO

CREATE TABLE dbo.main_table (id INT PRIMARY KEY IDENTITY,
                             value1 NVARCHAR(100) NULL,
                             value2 NVARCHAR(100) NULL,
                             value3 NVARCHAR(100) NULL);
GO

INSERT INTO dbo.daily_table (value1, value2)
VALUES('1', '2');

-- Insert with Select
INSERT INTO dbo.main_table (value1, value2)
SELECT  value1,     value2
FROM    dbo.daily_table;

Also, it's better not to use an asterisk in your 'SELECT' statement since the result could be unpredictable.

Ivan
  • 343
  • 2
  • 7