1

I want to delete three million data on data basis in sql server. I am using a very simple query and batch deleting the files but the command is executing and still running for more than 3 hours. What could be the possible reasons for such low performance. Please find the code below that I am using

Create PROC [dbo].[DeleteOneWeekOldData] 
@StartDate DateTime,
@EndDate DateTime

AS

DECLARE @continue INT
DECLARE @rowcount INT

SET @continue = 1
WHILE @continue = 1
BEGIN
    SET ROWCOUNT 1000
    BEGIN TRANSACTION
    DELETE FROM WorkflowContentDetails WHERE  StartDateTime BETWEEN @StartDate AND @EndDate
    SET @rowcount = @@rowcount 
    COMMIT
       IF @rowcount = 0
    BEGIN
        SET @continue = 0
    END
END

GO
Community
  • 1
  • 1
Prashant Kumar
  • 239
  • 1
  • 6
  • 15
  • 3
    How long does this take? `SELECT COUNT(*) FROM WorkflowContentDetails WHERE StartDateTime BETWEEN @StartDate AND @EndDate` Here you find some possible reasons: http://stackoverflow.com/a/10901711/284240 – Tim Schmelter Dec 16 '13 at 11:00
  • 1
    Is `StartDateTime` indexed? If not the `WHERE` clause will take successively longer on each iteration as it needs to scan through the previously read non matching rows. – Martin Smith Dec 16 '13 at 11:01
  • Can you tell us the indexes you have on WorkflowContentDetails? – Bruno Costa Dec 16 '13 at 11:02
  • StartDateTime is noncluctered index on the table – Prashant Kumar Dec 16 '13 at 11:02
  • The format in which the startdatetime comes is '2013-12-08 00:09:47.527' – Prashant Kumar Dec 16 '13 at 11:04
  • If you poll `sys.dm_os_waiting_tasks` whilst the delete is running what wait types do you see for that session? – Martin Smith Dec 16 '13 at 11:04
  • Deleting millions of rows takes it shares from the transaction itself. Its loging data changes. That's why a truncate in that table ill run instantly and delete even more rows (truncates dont log changes). If you got FKs to that table they ill make sure you are not letting orphaned rows and this test ill consume a bit of time. You can drop the indexes before delete and recreate then after it to test speed. – jean Dec 16 '13 at 11:05
  • If you challenge is to make it user responsive a (maybe not feasible) way to do it is to do a logic delete instead. Create a [deleted] flag column and just update it (and the hard part, change your app to use it) and create a "clean deleted" job to run at a later (maintenance) time. – jean Dec 16 '13 at 11:11
  • What SQL Server version; can you use a partitioned table? – Rubens Farias Dec 16 '13 at 11:14
  • As you do not manage error cases and if you encountered an error before, you could have a transaction pending. Check if this is the cas with @@Trancount server variable. You should add a rollback somewhere in your code – user3041160 Dec 16 '13 at 11:21
  • 1
    @user1677027 You need to filter the results of that to just the session that is doing the delete. – Martin Smith Dec 16 '13 at 11:22

4 Answers4

3

There are two problems with your query.

  1. SET ROWCOUNT 1000 causes the data to be deleted in very small chunks. If you need to delete 1M rows, that would be 1000 delete operations. SQL Server will better handle small number of big operations, than large number of small operations. Besides, SET ROWCOUNT will no more affect insert/update/delete operations in future releases.

  2. StartDateTime BETWEEN @StartDate AND @EndDate is being executed every time. If there's no index on that field, it could take a lot of time. Instead of that, it would be better to first select values from Primary Key column (if you have one, and if there's (clustered)index defined on it) into a temp table and then work it in a loop - with condition on indexed column it would work few times faster.

AdamL
  • 12,421
  • 5
  • 50
  • 74
3

Your best performance in this case will be to partition the table by date, and then truncate or drop the partitions when they are no longer needed (rather than using the DELETE statement).

The reasons for the performance gain are:

1) By specifying which partition to truncate, by definition you are also defining the date range, so SQL Server does not need to hunt for it, whether with an index-based retrieval or a table scan, either of which will take time.

2) The TRUNCATE operation is DDL, not DML like DELETE, so the operation does not get written to transaction logs, making it much faster. It also does not carry the risk of filling up your log files. (Of course this may have implications for incremental backups and replication, if you are are using either of those.)

In the past, I have seen a TRUNCATE operation complete in less than a minute where the corresponding DELETE took hours.

A walk-through of the strategy is available here: http://www.galaxysql.com/2013/09/sql-server-partitioning-for-performance-and-archiving/

Robert N
  • 1,156
  • 2
  • 14
  • 32
2

A few things that can cause delete to be slow. You already read about it in the comments.

For ex. you need 3M rows / 1k rows = 3000 times to do search without idndex. And what's more, you're imposes an exclusive lock on the table 3000 times because of the absence an index.

This is my "DELETE" pattern on production. It solves these problems, even if there is no suitable index. A lock is only on the primary key and only during delete.

SET NOCOUNT ON

DECLARE 
    @StartDate DateTime = '20130101',
    @EndDate DateTime = '20131201'

DECLARE @chunk_size bigint = 1000
DECLARE @RowCount bigint;
DECLARE @delay      DATETIME = '00:00:01'       --- 1 second by defaul, Used for delaying the updates inside the loop, can be 0

DECLARE @Chunk_IDs as TABLE (
    [ID] int NOT NULL,
    [SourcePKID] int not null
);

IF OBJECT_ID('tempdb..#temp_OldDataIDs') is not null
    DROP TABLE #temp_OldDataIDs;

CREATE TABLE #temp_OldDataIDs (
    [ID]            INT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,
    [SourcePKID]    INT NOT NULL,
    [DeletedStatus] BIT NOT NULL DEFAULT 0);

INSERT INTO #temp_OldDataIDs ([SourcePKID])
SELECT [ID]
FROM WorkflowContentDetails 
WHERE StartDateTime BETWEEN @StartDate AND @EndDate

SET @RowCount = @@ROWCOUNT;

CREATE NONCLUSTERED INDEX IX_#temp_OldDataIDs on #temp_OldDataIDs ([DeletedStatus]) include([ID],SourcePKID)

WHILE (@RowCount != 0)
BEGIN

    DELETE @Chunk_IDs;

    INSERT INTO @Chunk_IDs ([ID], [SourcePKID])
    SELECT TOP (@chunk_size)
        [ID], [SourcePKID]
    FROM #temp_OldDataIDs
    WHERE [DeletedStatus] = 0
    ORDER BY [ID];

    SET @RowCount = @@ROWCOUNT;
    IF @RowCount = 0 BREAK;


    DELETE WorkflowContentDetails
    FROM WorkflowContentDetails
        INNER JOIN @Chunk_IDs ChunkIDs ON WorkflowContentDetails.[ID] = ChunkIDs.[SourcePKID];

    UPDATE OldIDs
        SET [DeletedStatus] = 1
    FROM #temp_OldDataIDs OldIDs
        INNER JOIN @Chunk_IDs ChunkIDs ON OldIDs.[ID] = ChunkIDs.[SourcePKID];

-- debug
-- PRINT CAST(@RowCount as varchar(30)) + ' ' + CONVERT(varchar(30), GETDATE(),121)

    -- The requested delay will hold the loop here as requested.
    WAITFOR DELAY  @delay
END


GO
AlexK
  • 9,447
  • 2
  • 23
  • 32
  • This is a really old answer, but I was linked to it recently and I believe there's a small bug in the code - The `WorkFlowContentDetails` ID was inserted into the chunk table as the column name `SourcePKID`, meaning within the DELETE statement your join should use the join condition `ON WorkflowContentDetails.[ID] = ChunkIDs.[SourcePKID]`, not `ChunkIDs.[ID]` – nemec Sep 10 '18 at 21:30
1

try this,why r using loop ?purpose ?

Create PROC [dbo].[DeleteOneWeekOldData] 
@StartDate DateTime,
@EndDate DateTime

AS
begin
Set NoCount on
DECLARE @continue INT
DECLARE @rowcount INT
Declare @err int
--SET @continue = 1 --why ar u using these commented part
--WHILE @continue = 1
--BEGIN
--    SET ROWCOUNT 1000
    BEGIN TRANSACTION
    DELETE FROM WorkflowContentDetails WHERE  StartDateTime BETWEEN @StartDate AND @EndDate
    SET @err = @@Error
    if(@err<>0)
    COMMIT
      else
       rollback
    END

GO
KumarHarsh
  • 5,046
  • 1
  • 18
  • 22
  • If I use a direct delete the data is about 3 million and it can lead to your logs being completely filled as every transaction is being logged. – Prashant Kumar Dec 16 '13 at 13:52