Best way to update 40 million rows in batch

Question

Basically I need to run this on a table with 40 million rows, updating every row at once will crash, so I want to batch the query so that if it crash, it can re-run the query and it would skip the finished batch and just continue with the ones left over.

UPDATE [table]
SET [New_ID] = [Old_ID]

What is the fastest way to do this? Here is how the table is created:

CREATE TABLE [table](
    [INSTANCE_ID] [int] NOT NULL,
    [table_ID] [bigint] IDENTITY(1,1) NOT NULL,
    [old_ID] [bigint] NOT NULL,
    [new_ID] [bigint] NOT NULL,
    [owner_ID] [int] NOT NULL,
    [created_time] [datetime] NULL
) ON [PRIMARY]

There are also indexes on created_time, owner_ID.

EDIT: My update statement is EXACTLY as shown, I literally just need to copy every entry in old_id into new_id for 40 million rows.

1 million so it doesn't crash? you will most likely need to do this is much smaller batches like 100 or 1000 at a time. — cool breeze, Sep 09 '16 at 20:53
wow, that's really small batch... I am willing to do whatever batch as long as it's the fastest. — Bill Software Engineer, Sep 09 '16 at 20:55
what is your update statement Exactly ?? or is it exactly what you have shown in your question , 40 million rows to a `New_ID` being updated with `Old_ID` column ??? — M.Ali, Sep 09 '16 at 20:57
Yes. It's EXACTLY has I have shown. It is literally just copying every old_id into new_id. — Bill Software Engineer, Sep 09 '16 at 20:58
Mostly locking and log size. I tried it, it take 1 hour and half and doesn't work. I would like to do it in batch so if it crash, I don't have to do it all over again and can just continue from where I left off. — Bill Software Engineer, Sep 09 '16 at 21:00
Food for thought.. especially since your logging is an issue. http://stackoverflow.com/questions/3711217/fastest-way-to-update-120-million-records — S3S, Sep 09 '16 at 21:08
I actually read it before, I would REALLY like to not create another table. — Bill Software Engineer, Sep 09 '16 at 21:11
Does this table has clustered Index? If not we can create clustered index on identity column with maximum fill factor and batch using clustered index key — Kannan Kandasamy, Sep 09 '16 at 21:16
@KannanKandasamy table_ID is clustered, yes, I will set recovery mode to Simple. — Bill Software Engineer, Sep 09 '16 at 21:17

M.Ali · Accepted Answer · 2016-09-09T21:14:02.860

18

Declare @Rowcount INT = 1;

WHILE (@Rowcount > 0)   
BEGIN
        UPDATE TOP (100000) [table]   --<-- define Batch Size in TOP Clause
           SET [New_ID] = [Old_ID]
        WHERE [New_ID] <> [Old_ID]

        SET @Rowcount = @@ROWCOUNT;

       CHECKPOINT;   --<-- to commit the changes with each batch
END

edited Sep 09 '16 at 21:14

answered Sep 09 '16 at 21:01

M.Ali

67,945
13
101
127

Would this skip batch that already finished? Just in case if it crash, I would like to run it again but continue where it left off. – Bill Software Engineer Sep 09 '16 at 21:03
Yes the `WHERE` clause takes care of it, Also Instead of One Million maybe try a batch of 100,000 . – M.Ali Sep 09 '16 at 21:04
Ok, l will try it! – Bill Software Engineer Sep 09 '16 at 21:08
**Warning**: special permissions are needed for this, otherwise you get: `Only the owner of database or someone with relevant permissions can run the CHECKPOINT statement.` – Julian Aug 29 '22 at 15:17

btberry · Answer 2 · 2016-09-09T22:01:55.057

M.Ali's suggestion will work, but you will end up with degrading performance as you work through the 40M records. I would suggest a better filter to find the records to update in each pass. This would assume you have a primary key (or other index) on your identity column:

DECLARE @Rowcount INT = 1
    ,   @BatchSize INT = 100000
    ,   @StartingRecord BIGINT = 1;

WHILE (@Rowcount > 0)   
BEGIN
    UPDATE [table]
        SET [New_ID] = [Old_ID]
    WHERE [table_ID] BETWEEN @StartingRecord AND @StartingRecord + @BatchSize - 1;

    SET @Rowcount = @@ROWCOUNT;

    CHECKPOINT;

    SELECT @StartingRecord += @BatchSize
END

This approach will allow each iteration to be as fast as the first. And if you don't have a valid index you need to fix that first.

paparazzo · Answer 3 · 2018-02-07T22:29:22.937

3

Select 1;  -- this will set a rowcount
WHILE (@@Rowcount > 0)   
BEGIN
  UPDATE TOP (1000000) [table]   
    SET [New_ID] =  [Old_ID]
  WHERE [New_ID] <> [Old_ID] 
    or ([New_ID] is null and [Old_ID] is not null)
END

100000 may work better for the top.

Since NewID and OldID is not null then the is null check is not necessary.

edited Feb 07 '18 at 22:29

answered Sep 11 '16 at 10:59

paparazzo

44,497
23
105
176

score 2 · Answer 4 · answered Mar 30 '18 at 04:34

Fastest way is to :

1) Create a temp table and insert all the values from old to temp table using the create(select having condition) statement.

2) Copy the constraints and refresh the indexes.

3) Drop the old table.

4) Rename temp table to original name.

Complete discussion is available on this link

Best way to update 40 million rows in batch

4 Answers4

Linked