We recently had a case where we wanted to do something similar, but slowly over days (updating only a certain number of records per run, and only during certain hours). The most recent data was fine, but millions of rows of older data needed to be updated. Our data table looks something like this:
Create Table FileContent
(
FileContent varchar(max),
File_PK bigint,
NewFileContent varchar(max)
)
And we only needed to update certain rows, but millions of them. We created a table to store our progress so we could use a scheduled job to iterate through and update the main table, then populated this table with the primary keys of the main table records that needed updating:
Create Table FilesToUpdate
(
File_PK bigint,
IsUpdated bit NOT NULL DEFAULT 0
)
Then we scheduled the following script to do the updating (for your own use, play with the batch sizing and scheduling for what works with your system).
/***
Script to update and fix records.
***/
DECLARE @Rowcount INT = 1 --
, @BatchSize INT = 100 -- how many rows will be updated on each iteration of the loop
, @BatchesToRun INT = 25 -- the max number of times the loop will iterate
, @StartingRecord BIGINT = 1;
-- Get the highest File_PK not already fixed as a starting point.
Select @StartingRecord = MAX(File_PK) From FilesToUpdate where IsUpdated = 0
-- While there are still rows to update and we haven't hit our limit on iterations...
WHILE (@Rowcount > 0 and @BatchesToRun > 0)
BEGIN
print Concat('StartingRecord (Start of Loop): ', @StartingRecord)
UPDATE FileContent SET NewFileContent = 'New value here'
WHERE File_PK BETWEEN (@StartingRecord - @BatchSize + 1) AND @StartingRecord;
-- @@Rowcount is the number of records affected by the last statement. If this returns 0, the loop will stop because we've run out of things to update.
SET @Rowcount = @@ROWCOUNT;
print Concat('RowCount: ', @Rowcount)
-- Record which PKs were updated so we know where to start next time around.
UPDATE FilesToUpdate Set IsUpdated = 1 where File_PK BETWEEN (@StartingRecord - @BatchSize + 1) AND @StartingRecord;
-- The loop will stop after @BatchSize*@BatchesToRun records are updated.
-- If there aren't that many records left to update, the @Rowcount checks will stop it.
SELECT @BatchesToRun = @BatchesToRun - 1
print Concat('Batches Remaining: ',@BatchesToRun)
-- Set the starting record for the next time through the loop.
SELECT @StartingRecord -= @BatchSize
print Concat('StartingRecord (End of Loop): ', @StartingRecord)
END