2

I'm doing a MS SQL Server update of a row in a table that's very simple. I'm replacing about 4 things with another 4 things.

update Table set Column1 = 'something new' where Column1 = 'something old';
update Table set Column1 = 'something new 2' where Column1 = 'something old 2';
update Table set Column1 = 'something new 3' where Column1 = 'something old 3';
update Table set Column1 = 'something new 4' where Column1 = 'something old 4';

That's really all there is to it. But my question is, this is a table with a huge number of records running in production, but the exact number is unknown before running the updates. There is a timestamp column. And it's probably more important to update the most recent ones first.

But my question is probably a more prectical one.

Is it best to partition this up by timestamp and let it run manually, or is there a better method of letting this run? I can also divide up the work by each update statement.

Or is there some way to put such a thing into the script itself?

I've tried looking at the plans for the queries, but it doesn't tell me the best way to split it up.

mark b
  • 213
  • 2
  • 7

1 Answers1

4

Use Update Top

You can update data as chunks using a while loop and Update Top option:

WHILE 1 = 1
BEGIN
    UPDATE top (1000) tableToUpdate
    SET Column1 = 'something new'
    WHERE 
       Column1 = 'something old';

    if @@ROWCOUNT < 1000 BREAK
END

When @@ROWCOUNT is less than 1000 which is the chunk size it implies that all rows are updated.

Note That, based on the official documentation:

The rows referenced in the TOP expression used with INSERT, UPDATE, or DELETE are not arranged in any order.

Update using TOP and Order BY

If you are looking to update sorted data based on a timestamp, in the official documentation they mentioned that:

If you must use TOP to apply updates in a meaningful chronology, you must use TOP together with ORDER BY in a subselect statement.

As Example:

WHILE 1 = 1
BEGIN
    UPDATE tableToUpdate
    SET Column1 = 'something new'
    FROM (SELECT TOP 1000 IDColumn FROM tableToUpdate WHERE tableToUpdate.Column1 = 'something old' ORDER BY TimeStamp DESC) tto
    WHERE 
       tableToUpdate.ID = tto.ID;

    if @@ROWCOUNT < 1000 BREAK
END

Other helpful links

Hadi
  • 36,233
  • 13
  • 65
  • 124
  • Nice and simple, but it doesn't allow an ORDER BY, at least not according to the docs. You would think it would, using TOP. – Tab Alleman Feb 11 '19 at 20:26
  • @TabAlleman you're right you cannot order by, in the documentation they mentioned that `The rows referenced in the TOP expression used with INSERT, UPDATE, or DELETE are not arranged in any order.` – Hadi Feb 11 '19 at 20:30
  • @TabAlleman but they also mentioned that `If you must use TOP to apply updates in a meaningful chronology, you must use TOP together with ORDER BY in a subselect statement. The following example updates the vacation hours of the 10 employees with the earliest hire dates.` – Hadi Feb 11 '19 at 20:33
  • I see, but doesn't the `tto` subselect need the `Column1` filter in its WHERE clause? – Tab Alleman Feb 11 '19 at 20:50
  • 2
    It would be better to "partition" the updates by an indexed column to prevent scanning the table multiple times. – Luis Cazares Feb 11 '19 at 21:34
  • @LuisCazares i agree with that. But the OP didn't mentioned that he is using partitions – Hadi Feb 11 '19 at 21:35
  • I'm not talking about real partitioning, just the way to limit the rows to be evaluated on each iteration of the update. In the solution, the table will need to read more and more rows as it finds the ones that need to be updated, instead of seeking the actual rows by a range in an indexed column. – Luis Cazares Feb 11 '19 at 21:38
  • @LuisCazares then i think you are talking about the second approach – Hadi Feb 11 '19 at 21:46