10

Basically I need to run this on a table with 40 million rows, updating every row at once will crash, so I want to batch the query so that if it crash, it can re-run the query and it would skip the finished batch and just continue with the ones left over.

UPDATE [table]
SET [New_ID] = [Old_ID]

What is the fastest way to do this? Here is how the table is created:

CREATE TABLE [table](
    [INSTANCE_ID] [int] NOT NULL,
    [table_ID] [bigint] IDENTITY(1,1) NOT NULL,
    [old_ID] [bigint] NOT NULL,
    [new_ID] [bigint] NOT NULL,
    [owner_ID] [int] NOT NULL,
    [created_time] [datetime] NULL
) ON [PRIMARY]

There are also indexes on created_time, owner_ID.

EDIT: My update statement is EXACTLY as shown, I literally just need to copy every entry in old_id into new_id for 40 million rows.

Bill Software Engineer
  • 7,362
  • 23
  • 91
  • 174

4 Answers4

18
Declare @Rowcount INT = 1;

WHILE (@Rowcount > 0)   
BEGIN
        UPDATE TOP (100000) [table]   --<-- define Batch Size in TOP Clause
           SET [New_ID] = [Old_ID]
        WHERE [New_ID] <> [Old_ID]

        SET @Rowcount = @@ROWCOUNT;

       CHECKPOINT;   --<-- to commit the changes with each batch
END
M.Ali
  • 67,945
  • 13
  • 101
  • 127
6

M.Ali's suggestion will work, but you will end up with degrading performance as you work through the 40M records. I would suggest a better filter to find the records to update in each pass. This would assume you have a primary key (or other index) on your identity column:

DECLARE @Rowcount INT = 1
    ,   @BatchSize INT = 100000
    ,   @StartingRecord BIGINT = 1;

WHILE (@Rowcount > 0)   
BEGIN
    UPDATE [table]
        SET [New_ID] = [Old_ID]
    WHERE [table_ID] BETWEEN @StartingRecord AND @StartingRecord + @BatchSize - 1;

    SET @Rowcount = @@ROWCOUNT;

    CHECKPOINT;

    SELECT @StartingRecord += @BatchSize
END

This approach will allow each iteration to be as fast as the first. And if you don't have a valid index you need to fix that first.

btberry
  • 377
  • 1
  • 7
3
Select 1;  -- this will set a rowcount
WHILE (@@Rowcount > 0)   
BEGIN
  UPDATE TOP (1000000) [table]   
    SET [New_ID] =  [Old_ID]
  WHERE [New_ID] <> [Old_ID] 
    or ([New_ID] is null and [Old_ID] is not null)
END

100000 may work better for the top.

Since NewID and OldID is not null then the is null check is not necessary.

paparazzo
  • 44,497
  • 23
  • 105
  • 176
2

Fastest way is to :

1) Create a temp table and insert all the values from old to temp table using the create(select having condition) statement.

2) Copy the constraints and refresh the indexes.

3) Drop the old table.

4) Rename temp table to original name.

Complete discussion is available on this link