I have a SQL Server database, which has a table called searches_tweets
with hundreds of millions of rows. In that table, I have search_id
and a datetime_created_at
column (I also have a datetime_modified_at
and tweet_id
columns)
I need to update the search_id
of 270 million rows. These rows correspond to a certain value of search_id
and a datetime_created_at
(I need to change all the search_id
after a certain date).
This is the first query I tried:
update searches_tweets
set search_id = 123
where search_id = 456
and datetime_created_at > '2020-01-07'
It was really slow and, after a few reconnection attempts (and all the changes not being committed) I decided to do a bulk update set with a commit after 4000 updated rows:
DECLARE @Rows INT,
@BatchSize INT;
SET @BatchSize = 4000;
SET @Rows = @BatchSize;
BEGIN TRY
WHILE (@Rows = @BatchSize)
BEGIN
UPDATE TOP (@BatchSize) tab
SET tab.search_id = 123
FROM searches_tweets tab
WHERE tab.search_id = 456
AND tab.datetime_created_at > '2020-01-07'
AND tab.search_id <> 123
SET @Rows = @@ROWCOUNT;
END;
END TRY
BEGIN CATCH
RAISERROR(
N'Error message',
10,
1
);
RETURN;
END CATCH;
The table has 3 indexes:
- One is on columns
search_id
andtweet_id
- One is on column
search_id
- One is on columns
datetime_modified_at
anddatetime_created_at
The table has a composite primary key that includes both search_id
and tweet_id
columns.
This works faster, but it still takes too long due to the amount of rows I have to update. Do you think about any way to do this faster?
Thank you!