1

There's a multi-step procedure in the Data Warehouse that generates a temp table with a list of Jobs that will be processed for each batch. Usually this is about 5,000 jobs. By the end of financial aggregation we may be looking at about 500,000 records processed. I've noticed that a very small part of it is giving me an Early Timeout on Optimization for just this part of the stored procedure:

DELETE jfs    
FROM DataWarehouse.dbo.JobFinancialSummary jfs        -- Financials table (> 3,000,000 records with indices)
 INNER JOIN #JobList jl ON jfs.JobID = jl.JobID       -- List of Jobs being processed (avg. of 5,000 records)
 INNER JOIN FiscalPeriod fp ON fp.ID = jfs.FiscalPeriodID    -- Month Reference Table (about 1,000 records)
WHERE fp.[Status] IN (1,2)   -- Last 2 months

The most confusing thing is that this is a relatively simple part of the stored procedure and all of the JOINs are on indices. My only question is how it gets timed out when the optimizer evaluates this. My understanding is that the optimizer gives each statement its own "Budget" but perhaps I'm missing something. Why the timeout here?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • 1
    This kind of questions may get more involvement,when these are provided :`table schema along with count,indexes on tables involved,execution plan of query ` – TheGameiswar Sep 02 '16 at 19:42

2 Answers2

2

There are many reasons for delete to be very slow:

Reasons:

  • Table might contain CDC, CT enabled
  • Table might contain triggers which work after operation of delete
  • Table might contain FK references, constraints
  • Table might contain indexed views associated with that table
  • Above all we need to check how transactional this table will be etc.,
  • There are many ways to do delete, Obviously batch delete is faster. First and famous way is to mark these records with soft delete attribute and delete offline during nightly time.

If it is going to be offline delete, to make that delete faster,

then capture clustered index keys for the deleting table

  • Disable indexes, FK's, Constraints as you will be taking care functionally on all these
  • Disable CT, CDC etc on that table if these are not required Create a script for indexed views and then drop the indexed views associated with this table
  • Then delete via batches, you can by setting @@rowCount or top batchsize We can delete any number of records faster this way.

    DELETE TOP 50000 -- based on scenario FROM table1 in loop

Call explicit 'CheckPoint' to make sure records are cleared from transaction log. Also make sure your 'Recovery Model' is 'Simple' not the 'Full' If it is going to be online delete still during daytime mark that as soft delete and in the nightly job run the deletes in very smaller batches

Kannan Kandasamy
  • 13,405
  • 3
  • 25
  • 38
2

It may likely be faster to isolate the rows/entries you want to delete first rather than joining whilst deleting.

Something like this assuming id is your primary key/identity:

IF OBJECT_ID('tempdb..#tmp') IS NOT NULL DROP TABLE #tmp

SELECT jfs.ID 
INTO #tmp
FROM DataWarehouse.dbo.JobFinancialSummary jfs        -- Financials table (> 3,000,000 records with indices)
INNER JOIN FiscalPeriod fp ON fp.ID = jfs.FiscalPeriodID    -- Month Reference Table (about 1,000 records)
WHERE fp.[Status] IN (1,2)
/*  EXISTS IS FASTER THAN A JOIN, AVOIDS FANNING */
AND EXISTS (SELECT 1 FROM #JobList jl where jfs.JobID = jl.JobID)       -- List of Jobs being processed (avg. of 5,000 records)

Then issue a delete such as:

DELETE TOP(1000) jfs
FROM DataWarehouse.dbo.JobFinancialSummary jfs 
WHERE EXISTS (SELECT 1 FROM #tmp t WHERE jfs.ID=t.ID)

From there, depending on how many rows you are deleting, you may wish to delete in a batch overnight -- anything over 5000 rows will escalate to table locks and is a prime candidate for batch deletion.

I wrote a fairly popular answer on how to accomplish large batch deletes here:

Deleting 1 millions rows in SQL Server

Dave C
  • 7,272
  • 1
  • 19
  • 30