-1

i have written this t-sql stored procedure. This is working fine , but since this stored procedure will be used to delete a lot of data (for example 1M to 2M) , I think, this can cause some locks in the table, or cause some db performance etc. So I am thinking, if we delete in batch for example at a time delete 1000 records. I am not totally sure about how to do this without cause any issue in db.

ALTER PROCEDURE [schema].[purge_data] @count INT---(count input can be in millions)
AS
DECLARE @p_number VARCHAR(22)
    ,@p_r_number VARCHAR(5)

DECLARE data_cursor CURSOR
FOR
SELECT TOP (@count) JRC_policy_number
    ,jrc_part_range_nbr
FROM [staging].[test].[p_location]
WHERE JRC_POLICY_TERM_DT < CAST('19950101 00:00:00.000' AS DATETIME)
    AND jrc_policy_status = 'T'

OPEN data_cursor

FETCH NEXT
FROM data_cursor
INTO @p_number
    ,@p_r_number

WHILE @@FETCH_STATUS = 0
BEGIN
    DELETE
    FROM [staging].[test].[p_location]
    WHERE JRC_policy_number = @p_number
        AND jrc_part_range_nbr = @p_r_number

    FETCH NEXT
    FROM data_cursor
    INTO @p_number
        ,@p_r_number
END

CLOSE data_cursor

DEALLOCATE data_cursor

Edit :

I had already tried without cursor - direct delete query like below.

DELETE TOP (1000) FROM [MyTab] WHERE YourConditions

It was very fast , it took 34 seconds to delete 1M records, but , during the 34 seconds, the table was locked completely. In production p_locator table is being used 24/7 , and being used by a very critical application, which expects response time in milliseconds, our purge script should not impact the the main application in any way. that's why I have chosen this cursor approach. pls guide

F0cus
  • 585
  • 3
  • 18
  • 52
  • if this procedure needs to delete lots and lots of data, then why do you use the slowest method that exists on a database, a cursor ? – GuidoG Aug 24 '22 at 08:38
  • 2
    Alternatives for cursor https://stackoverflow.com/questions/8955897/how-to-delete-the-top-1000-rows-from-a-table-using-sql-server-2008 – P.Salmon Aug 24 '22 at 08:56
  • i thought any other method may cause some locks etc. I already tried `DELETE TOP (1000) FROM [MyTab] WHERE YourConditions` this code completely locked the table – F0cus Aug 24 '22 at 09:37
  • Since the table `p_location` is being used by a critical application which runs 24/7 and they expect the response time in milliseconds. a little bit trade off in delete time is acceptable, but this deletion process should not impact the main application. – F0cus Aug 24 '22 at 09:41
  • @OrangeAndGrapes - What is the DDL for the indexes that are on the table? Also, are there any FKs that point at this table, regardless of which column they point to?? – Jeff Moden Aug 25 '22 at 03:08
  • @jeff Moden I have added the screen shot of the table description. I don't see any Foreign keys, but three PKs - it must be some primary keys. – F0cus Aug 25 '22 at 06:04
  • https://www.mssqltips.com/sqlservertip/6238/fastest-way-to-delete-large-number-of-records-in-sql-server/ – Mitch Wheat Aug 25 '22 at 06:05
  • 1
    https://blog.greglow.com/2020/02/27/minimizing-locking-while-archiving-rows-from-a-main-table-in-sql-server-t-sql/ – Mitch Wheat Aug 25 '22 at 06:06
  • 1
    https://www.erikdarlingdata.com/why-does-fast_forward-make-some-cursors-so-much-faster/ – Mitch Wheat Aug 25 '22 at 06:07
  • @GuidoG, I've seen slower. Cursor is one of the faster ways to do RBAR. However, this the worst kind of Cursor and all RBAR is just wrong. – Jodrell Aug 25 '22 at 12:51
  • 1
    @Jodrell - Being the guy that invented the RBAR term, I'll tell you that "It Depends" and this isn't fully RBAR. It's NOT deleting one row at a time. It's being used as a control mechanism to delete smaller sets. It's like the difference between incremental counting and exploding a hierarchy using an rCTE... the former is RBAR, the latter is "multi-set based". – Jeff Moden Aug 25 '22 at 13:46
  • @OrangeAndGrapes - It's not 3 PKs... It's a single PK with 3 keys and to guarantee some speed, you're deletes should use all 3. You should also probably have an index on the JRC_POLICY_TERM_DT column. I wish you hadn't blocked out the names of the columns... it makes it impossible to write a bit of code to help. I'd also write the code to capture those 3 columns in a temp table and use that as a driver for a delete. You might also want to add a 1 second "WAIT FOR DELAY" in the loop to be sure other processes have the priority (making it a "DELETE CRAWLER"). – Jeff Moden Aug 25 '22 at 13:55
  • @JeffModen, To be honest, I didn't dive too deeply into the query but, I saw the cursor was not FAST_FOWARD. I'm assuming `[JRC_p_number], [j_p_r_nbr]` represents a candidate key on `[staging].[test].[p_location]`. – Jodrell Aug 25 '22 at 14:03
  • Using a fast_forward cursor to slowly yield the limits of buckets of rows spread thinly across the clustered index, throttled by a time delay, rather than performance, sounds like a good approach. You may want to do something about fragmentation after completion. – Jodrell Aug 25 '22 at 14:12
  • Let's hope the PK keys are "temporal" in nature and so it eats away at complete pages at the "top" of the table so there won't be any logical or physical (page density) fragmentation issues. – Jeff Moden Aug 25 '22 at 15:44
  • @Jeff Moden added the screen shot - Could you pls tell me if code (in the answer section) is good enough? actually it's taking 13 minutes to delete1M records. – F0cus Aug 25 '22 at 16:06
  • @OrangeAndGrapes - Dunno yet. Is there an index on the JRC_POLICY_TERM_DT? Also, you say it's taking 13 minutes to delete 1M "records". Do you mean all at once? And how many total "records" meet the criteria for deletion? – Jeff Moden Aug 25 '22 at 16:10
  • @JeffModen there are no Index on `JRC_POLICY_TERM_DT` column , but there is a cluster index on those primary keys. We have totally 41 million records to be deleted(which matches the criteria). We are planning to delete 1 or 2 Million at a time. I tested the stored Procedure in the answer section by passing 1 million - and it took 13 minutes – F0cus Aug 25 '22 at 16:19
  • @JeffModen current plan is to delete 1 or 2 million in a day. – F0cus Aug 25 '22 at 16:27
  • @user_che_ban - If there was no substantial blocking and it's deleting the correct stuff, then it sounds like you've got it done. You may now want to setup nightly rolling deletes based on a max "keep it" time. Another thing you could do is "partitioning". I prefer partitioned views so I can put the old stuff in a different static database that never gets backed up except for any new archive tables and then they only get backed up once. It also makes DR restores to "get back in business" very rapid and the legacy data can be restored at a more leisurely pace . – Jeff Moden Aug 25 '22 at 20:02

1 Answers1

0

With some of your references I've written the below stored proc. Ofcourse there will be ALOT of scope for improvements. Pls share.

ALTER PROCEDURE [dbo].[purge_data] @count INT
AS
DECLARE @iteration INT
       ,@remainder INT
       ,@current_count INT

BEGIN
    SELECT @current_count = count(*)
    FROM PROD_TBL
    WHERE JRC_POLICY_TERM_DT < CAST('19950101 00:00:00.000' AS DATETIME)
        AND JRC_POLICY_STATUS = 'T'
        AND JRC_PLCY_ADMIN_SYS_CD = 'X'

    IF (@current_count < @count)
    BEGIN
        SET @count = @current_count
    END

    SET @iteration = @count / 10000
    SET @remainder = @count % 10000

    WHILE (@iteration > 0)
    BEGIN
        DELETE
        FROM PROD_TBL
        FROM (
            SELECT TOP 10000 JRC_POLICY_NUMBER
                ,JRC_PART_RANGE_NBR
                ,JRC_PLCY_ADMIN_SYS_CD
            FROM PROD_TBL
            WHERE JRC_POLICY_TERM_DT < CAST('19950101 00:00:00.000' AS DATETIME)
                AND JRC_POLICY_STATUS = 'T'
                AND JRC_PLCY_ADMIN_SYS_CD = 'X'
            ) pol_locator_tbl
        WHERE PROD_TBL.JRC_POLICY_NUMBER = pol_locator_tbl.JRC_POLICY_NUMBER
            AND PROD_TBL.JRC_PART_RANGE_NBR = pol_locator_tbl.JRC_PART_RANGE_NBR
            AND PROD_TBL.JRC_PLCY_ADMIN_SYS_CD=pol_locator_tbl.JRC_PLCY_ADMIN_SYS_CD

        SET @iteration = @iteration - 1
    END

    IF (@remainder > 0)
    BEGIN
        DELETE
        FROM PROD_TBL
        FROM (
            SELECT TOP (@remainder) JRC_POLICY_NUMBER
                ,JRC_PART_RANGE_NBR
                ,JRC_PLCY_ADMIN_SYS_CD
            FROM PROD_TBL
            WHERE JRC_POLICY_TERM_DT < CAST('19950101 00:00:00.000' AS DATETIME)
                AND JRC_POLICY_STATUS = 'T'
                AND JRC_PLCY_ADMIN_SYS_CD = 'X'
            ) pol_locator_tbl
        WHERE PROD_TBL.JRC_POLICY_NUMBER = pol_locator_tbl.JRC_POLICY_NUMBER
            AND PROD_TBL.JRC_PART_RANGE_NBR = pol_locator_tbl.JRC_PART_RANGE_NBR
            AND PROD_TBL.JRC_PLCY_ADMIN_SYS_CD=pol_locator_tbl.JRC_PLCY_ADMIN_SYS_CD

    END
END
END
F0cus
  • 585
  • 3
  • 18
  • 52