SQL update for 60M rows

Question

I am working in SQL Server 2012. I have the following DML:

BEGIN TRAN
UPDATE A
SET
A.colx = B.coly
FROM DB1.dbo.Table1 A
INNER JOIN DB2.dbo.Table2 B
    ON B.col1 = A.col1
    AND B.col2 = A.col2
WHERE
A.colx <> B.coly
AND B.col3 IN ('a', 'b', 'c', 'd', 'e', 'f')
;

It takes an unacceptably long time to run. After an hour, I had to cancel the query. So, I need to tune it.

I tried to inspect the query plan, but I get a SHOWPLAN permission denied message on DB2. So, I ran the following query:

SELECT
COUNT(*)
FROM DB1.dbo.Table1 A
INNER JOIN DB2.dbo.Table2 B
    ON B.col1 = A.col1
    AND B.col2 = A.col2
WHERE
A.colx <> B.coly
AND B.col3 IN ('a', 'b', 'c', 'd', 'e', 'f')

This query returns in about 1 [min], and COUNT(*) = 60M. So, I don't think it's the read operation that is causing the first query to run so slow. It's the write operation that is the bottleneck.

Table A is an 800M-row, 75-[GB] table. I have been told that I don't have the disk space to use a SELECT INTO technique. It has a non-clustered index on its primary key. col1 and col2 are part of this primary key. It has other non-clustered indexes and a clustered index. Normally, because of the clustered index, I would follow the "non-simple" technique given here: Fastest way to update 120 Million records. However, I don't have other schemas available to me in DB1. So, I can't use the "non-simple" technique. Table B is a 10M-row table. It has no indexes and no keys.

What is the best way to tune this UPDATE statement? Perhaps batches?

There is nothing wrong to remove `A.colx <> B.coly` condition. This comparison is very slow because it require table scan. — Alex Kudryashev, May 10 '16 at 18:11
I understand that your code sample would be a snippet, but you do have a `BEGIN TRAN` without a subsequent `COMMIT TRAN` — Steven, May 10 '16 at 18:12
@Steven: Yes, I know that I am missing the COMMIT TRAN. The actual script has both a COMMIT TRAN and a ROLLBACK TRAN. — skyline01, May 10 '16 at 18:20
you can break it down in a `CURSOR` perhaps that gives you more room to monitor the progress. — yazanpro, May 10 '16 at 20:45
I've run into problems where keeping the indexes in sync with the updates kills the performance, so disabling the indexes running the update and then re-creating the indexes can actually perform faster.... I believe it has to do with with the location of the data/indexes on the physical drives. and your not updating 60M rows, you're updating 60M*indexes involved. http://stackoverflow.com/questions/751039/slow-bulk-insert-for-table-with-many-indexes — xQbert, May 10 '16 at 20:47

SQL update for 60M rows

0 Answers0