How to delete a massive amount of data from several related tables in SQL on MS Server

Question

Given a "main" table which has a single primary key, from which a huge number of rows need to be deleted (perhaps about 200M). In addition, there are about 30 "related" tables that are related to the main table, and related rows must also be deleted from each. It is expected that about an equivalent huge number of rows (or more) would need to be deleted from each of the related tables.

Of course it's possible to change the condition to partition the amount of data to be deleted, and run it several times, but in any case, I need an efficient solution to do this.

John Rees suggests a way to do massive deletes in a single table in Delete Large Number of Rows Is Very Slow - SQL Server , but the problem with that is that it performs several transactional deletes in a single table. This could potentially leave the db in an inconsistent state.

John Gibb suggests a way to delete from several related tables, in How do I delete from multiple tables using INNER JOIN in SQL server , but it does not consider the possibility that the amount of data to be deleted from each of these tables is large.

How can these two solutions be combined into an efficient way to delete a large number of rows from several related tables? (I'm new to SQL)

Perhaps it's important to note that, in the scope of this problem, each "related" table is only related to the "main" table

I'm guessing `on delete cascade` is not specified in tge foreign key constraints... — Zohar Peled, May 09 '17 at 18:10
If you don't have `on delete cascade` enabled as mentioned, you're best option is the batched delete, and stick to batches under 5000 rows, or you risk locking the table (thanks marc_s!). You could write the batch so that it deletes the same 4500 rows from each related table then the main in one loop, rather than cleaning all 200M from each related table before moving on to the main table. -- I posted a similar batch delete answer here a few years ago https://stackoverflow.com/questions/24785439/deleting-1-millions-rows-in-sql-server/24785802#24785802 — Dave C, May 09 '17 at 20:53

score 1 · Answer 1 · answered May 09 '17 at 21:04

I think this is what you're after...

This will delete 4000 rows from the tables with the foreign key references (assuming 1:1) before deleting the same 4000 rows from the main table.

It will loop until done, or it hits the stop time (if enabled).

DECLARE @BATCHSIZE INT, @ITERATION INT, @TOTALROWS INT, @MAXRUNTIME VARCHAR(8), @BSTOPATMAXTIME BIT, @MSG VARCHAR(500)
SET DEADLOCK_PRIORITY LOW;
SET @BATCHSIZE = 4000
SET @MAXRUNTIME = '08:00:00' -- 8AM
SET @BSTOPATMAXTIME = 1 -- ENFORCE 8AM STOP TIME
SET @ITERATION = 0 -- LEAVE THIS
SET @TOTALROWS = 0 -- LEAVE THIS

IF OBJECT_ID('TEMPDB..#TMPLIST') IS NOT NULL DROP TABLE #TMPLIST
CREATE TABLE #TMPLIST (ID BIGINT)

WHILE @BATCHSIZE>0
BEGIN
    -- IF @BSTOPATMAXTIME = 1, THEN WE'LL STOP THE WHOLE JOB AT A SET TIME...
    IF CONVERT(VARCHAR(8),GETDATE(),108) >= @MAXRUNTIME AND @BSTOPATMAXTIME=1
    BEGIN
        RETURN
    END

    TRUNCATE TABLE #TMPLIST

    INSERT INTO #TMPLIST (ID)
    SELECT TOP(@BATCHSIZE) ID
    FROM MAINTABLE
    WHERE X=Y -- DELETE CRITERIA HERE...  

    SET @BATCHSIZE=@@ROWCOUNT

    DELETE T1
    FROM SOMETABLE1 T1
    WHERE EXISTS (SELECT 1 FROM #TMPLIST T WHERE T1.MAINID=T.ID)

    DELETE T2
    FROM SOMETABLE2 T2
    WHERE EXISTS (SELECT 1 FROM #TMPLIST T WHERE T2.MAINID=T.ID)

    DELETE T3
    FROM SOMETABLE3 T3
    WHERE EXISTS (SELECT 1 FROM #TMPLIST T WHERE T3.MAINID=T.ID)

    DELETE M
    FROM MAINTABLE M
    WHERE EXISTS (SELECT 1 FROM #TMPLIST T WHERE T3.MAINID=M.ID)

    SET @ITERATION=@ITERATION+1
    SET @TOTALROWS=@TOTALROWS+@BATCHSIZE
    SET @MSG = 'Iteration: ' + CAST(@ITERATION AS VARCHAR) + ' Total deletes:' + CAST(@TOTALROWS AS VARCHAR)
    RAISERROR (@MSG, 0, 1) WITH NOWAIT
END

How to delete a massive amount of data from several related tables in SQL on MS Server

1 Answers1