52

I am working on a client's database and there is about 1 million rows that need to be deleted due to a bug in the software. Is there an efficient way to delete them besides:

DELETE FROM table_1 where condition1 = 'value' ?
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Peter Sun
  • 1,675
  • 4
  • 27
  • 50
  • 4
    Yes you need to delete them in batches. Deleting a million rows in a single statement will destroy your transaction log. – Sean Lange Jul 16 '14 at 16:01
  • could i write a script or something? like a loop the following statement `Delete from table_1 where condition_1 = 'value' order by id LIMIT 1000`? – Peter Sun Jul 16 '14 at 16:04
  • possible duplicate of [What is the best way to delete a large number of records in t-sql?](http://stackoverflow.com/questions/794865/what-is-the-best-way-to-delete-a-large-number-of-records-in-t-sql) – huMpty duMpty Jul 16 '14 at 16:05
  • [Delete large amount of data in sql server](http://stackoverflow.com/questions/2138593/delete-large-amount-of-data-in-sql-server) – huMpty duMpty Jul 16 '14 at 16:06
  • 1
    That's not a dupe @huMptyduMpty... it was just poorly titled. The OP isn't looking to truncate their table. – Ben Jul 16 '14 at 16:06
  • TRUNCATE TABLE if bad data is not mixed in with good. – CodeMonkey Jul 16 '14 at 16:10
  • Not sure. Seriously. I delete my rows in a sp - but 64 millions at a time (per statement) in loop. So, it really depends on your logs (Mine are around 400gb preallocated because I do bulk operations in that database). – TomTom Jul 16 '14 at 16:13
  • 15
    If you delete more than 5000 rows in a single transaction, SQL Server will do a **lock escalation** and lock the entire table in exclusive mode, for the duration of the whole transaction. No one can do anything with that table anymore, not even select from it, until you finish your transaction. I'd delete those rows in batches of **less than 5000** each to avoid such a massive interruption – marc_s Jul 16 '14 at 16:14
  • couldn't i just make a copy of the good data to a temp table, truncate the data, then add the good something like this? `SELECT * INTO tmp_My_Table FROM My_Table WHERE TRUNCATE TABLE My_Table INSERT INTO My_Table SELECT * FROM tmp_My_Table` – Peter Sun Jul 16 '14 at 16:14
  • 1
    @PeterSun You could do that so long as you don't have foreign key references to `My_Table`. – Brian Driscoll Jul 16 '14 at 16:17
  • 1
    Possible duplicate of http://stackoverflow.com/questions/11230225/how-to-efficiently-delete-rows-while-not-using-truncate-table-in-a-500-000-rows – Ruud Helderman Jul 16 '14 at 16:26
  • 2
    @marc_s of course if you use chunks and commit in between, then the lock escalation becomes much less relevant. http://sqlperformance.com/2013/03/io-subsystem/chunk-deletes – Aaron Bertrand Jul 16 '14 at 16:37
  • @marc_s have you got a reference for the figure of 5000? –  Jul 16 '14 at 17:02
  • @JackDouglas: see [Lock Escalation documentation](http://technet.microsoft.com/en-us/library/ms184286%28v=sql.105%29.aspx) on TechNet – marc_s Jul 16 '14 at 18:11
  • @marc_s are you saying that 5000 locks = 5000 rows? –  Jul 16 '14 at 18:58
  • @JackDouglas: yes - by default, SQL Server will put a separate lock on every single row - until it reaches 5000 locks. – marc_s Jul 16 '14 at 20:52
  • @marc_s thanks. I asked about this [in The Heap](http://chat.stackexchange.com/transcript/message/16618874#16618874) and it kicked off an interesting conversation. I don't know the exact conditions but apparently [a large operation might start with page-level locks](http://chat.stackexchange.com/transcript/message/16624014#16624014). I'm trying to dig up references... –  Jul 17 '14 at 05:32
  • @JackDouglas: no, SQL Server doesn't do page-level locks. It will use row-level locks until it hits that threshold of 5000 locks per transaction, and then it escalates straight to table-level locks. – marc_s Jul 17 '14 at 06:04
  • @marc_s: No page locks in SQL Server? [Hmm...](http://technet.microsoft.com/en-us/library/aa213039%28v=sql.80%29.aspx) – Andriy M Jul 17 '14 at 06:20
  • @AndriyM: not in the process of a **lock escalation** - it goes straight from row-level locks to table-level lock (or optionally to partition-level locks first, if the table is partitioned) - but it **never** goes to page-level locking – marc_s Jul 17 '14 at 07:05
  • @marc_s: Oh I see, I missed that, sorry. – Andriy M Jul 17 '14 at 07:08
  • For anyone wondering if theses thresholds values are still accurate, here's the latest SQL SERVER documentation [Lock Escalation Thresholds](https://learn.microsoft.com/en-us/sql/relational-databases/sql-server-transaction-locking-and-row-versioning-guide?view=sql-server-ver15#lock-escalation-thresholds) – MrBootAndTheBuilding Jan 13 '21 at 15:57

6 Answers6

88

Here is a structure for a batched delete as suggested above. Do not try 1M at once...

The size of the batch and the waitfor delay are obviously quite variable, and would depend on your servers capabilities, as well as your need to mitigate contention. You may need to manually delete some rows, measuring how long they take, and adjust your batch size to something your server can handle. As mentioned above, anything over 5000 can cause locking (which I was not aware of).

This would be best done after hours... but 1M rows is really not a lot for SQL to handle. If you watch your messages in SSMS, it may take a while for the print output to show, but it will after several batches, just be aware it won't update in real-time.

Edit: Added a stop time @MAXRUNTIME & @BSTOPATMAXTIME. If you set @BSTOPATMAXTIME to 1, the script will stop on it's own at the desired time, say 8:00AM. This way you can schedule it nightly to start at say midnight, and it will stop before production at 8AM.

Edit: Answer is pretty popular, so I have added the RAISERROR in lieu of PRINT per comments.

DECLARE @BATCHSIZE INT, @WAITFORVAL VARCHAR(8), @ITERATION INT, @TOTALROWS INT, @MAXRUNTIME VARCHAR(8), @BSTOPATMAXTIME BIT, @MSG VARCHAR(500)
SET DEADLOCK_PRIORITY LOW;
SET @BATCHSIZE = 4000
SET @WAITFORVAL = '00:00:10'
SET @MAXRUNTIME = '08:00:00' -- 8AM
SET @BSTOPATMAXTIME = 1 -- ENFORCE 8AM STOP TIME
SET @ITERATION = 0 -- LEAVE THIS
SET @TOTALROWS = 0 -- LEAVE THIS

WHILE @BATCHSIZE>0
BEGIN
    -- IF @BSTOPATMAXTIME = 1, THEN WE'LL STOP THE WHOLE JOB AT A SET TIME...
    IF CONVERT(VARCHAR(8),GETDATE(),108) >= @MAXRUNTIME AND @BSTOPATMAXTIME=1
    BEGIN
        RETURN
    END

    DELETE TOP(@BATCHSIZE)
    FROM SOMETABLE
    WHERE 1=2

    SET @BATCHSIZE=@@ROWCOUNT
    SET @ITERATION=@ITERATION+1
    SET @TOTALROWS=@TOTALROWS+@BATCHSIZE
    SET @MSG = 'Iteration: ' + CAST(@ITERATION AS VARCHAR) + ' Total deletes:' + CAST(@TOTALROWS AS VARCHAR)
    RAISERROR (@MSG, 0, 1) WITH NOWAIT
    WAITFOR DELAY @WAITFORVAL 
END
Dave C
  • 7,272
  • 1
  • 19
  • 30
  • 1
    Nice suggestion ... must admit. – Rahul Jul 16 '14 at 16:30
  • 12
    You should commit after each delete (and really no need to put a WAITFOR in there - other transactions will be able to get in between the commit and the next transaction no matter how fast it is, so all this does is make the process take 10 seconds * number of batches longer). You can also RAISERROR WITH NOWAIT instead of PRINT to avoid having to wait for the buffer to fill up before seeing messages. – Aaron Bertrand Jul 16 '14 at 16:39
  • Thanks Aaron! I'll definitely look into them for my own usage. – Dave C Jul 16 '14 at 17:03
  • 34
    Any delete operation will causing locking - on the **row level**. But once your transaction has more than 5000 row-level locks, SQL Server will do a **lock escalation** and lock the **entire table** in exclusive mode.... – marc_s Jul 17 '14 at 06:06
  • 3
    Wow @marc_s! That may be the most beneficial simple piece of information that I have ever learned about SQL. Thanks for sharing! – Adam Scharp Jun 18 '15 at 15:26
  • 2
    This worked well for me, and it did take a long time - in fact, multiple runs - to get the rows deleted. I added a transaction around the statement in hopes that would keep the log from growing too large, which seemed to work well. I had a table with 90MM rows that I need to delete about 10MM rows from - and this was on SQL Express db at the 10Gb limit. Very handy. – qxotk Oct 12 '16 at 15:59
  • @condiosluzverde- how long did it take. I am trying to delete 10M rows from a table this way and keep inserting single row every 3 seconds in the same table. Insert is working fine. No locking is happening. But delete is taking forever. – Udit Solanki May 15 '18 at 11:14
  • Backing up @marc_s's insightful comment with a link to MS Docs confirming the lock escalation when the 5000 row threshold is exceeded during delete: https://learn.microsoft.com/en-us/troubleshoot/sql/performance/resolve-blocking-problems-caused-lock-escalation#lock-escalation-thresholds – JonoB Jun 29 '21 at 11:16
  • 1
    8 years later and this is still one of the most helpful SQL Server on SO – Whelkaholism Nov 10 '22 at 09:15
13
BEGIN TRANSACTION     
    DoAgain:
    DELETE TOP (1000)
    FROM <YourTable>
    IF @@ROWCOUNT > 0
    GOTO DoAgain
COMMIT TRANSACTION
sansalk
  • 4,595
  • 2
  • 36
  • 37
  • 3
    Isn't the transaction outside of the loop, thus making the transaction log fill like it would do without the transaction block? I think you should commit after each 1000 deletes. – Koja May 31 '18 at 13:46
  • 1
    In that case @@ROWCOUNT seems to be reset. I solved this by defining a local variable outside the scope of the transaction, and assigning @@ROWCOUNT to the variable – Matt Evans Nov 20 '18 at 07:33
11

Maybe this solution from Uri Dimant

WHILE 1 = 1
BEGIN
   DELETE TOP(2000)
   FROM Foo
   WHERE <predicate>;
   IF @@ROWCOUNT < 2000 BREAK;
END

(Link: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/b5225ca7-f16a-4b80-b64f-3576c6aa4d1f/how-to-quickly-delete-millions-of-rows?forum=transactsql)

Tung Nguyen
  • 574
  • 8
  • 16
6

Here is something I have used:

  1. If the bad data is mixed in with the good-

    INSERT INTO #table 
       SELECT columns 
       FROM old_table 
       WHERE statement to exclude bad rows
    
    TRUNCATE old_table
    
    INSERT INTO old_table 
       SELECT columns FROM #table
    
CodeMonkey
  • 1,087
  • 2
  • 15
  • 24
  • 5
    This approach would potentially cause a lot of problems with referential integrity and indexing. – Brian Driscoll Jul 16 '14 at 16:18
  • Could potentially cause, yes. I would expect Peter to know the data as he is the one in it. – CodeMonkey Jul 16 '14 at 16:24
  • 1
    I've done this same thing numerous times. But yes, it does depend a lot on issues @BrianDriscoll just pointed out. But assuming the table's referred to by few to no FK's and the remaining data is a fraction of this million, this would by far be the easiest, fastest and lightest (log-wise) solution. – Kahn Jul 17 '14 at 06:42
  • I would normally vote down a suggestion to use TRUNCATE_TABLE - since the question states it needs to keep some data, did not vote down because the answer offers info in addition to the truncate option. Answer could be improved by removing the TRUNCATE TABLE part - or move to the end, if the author would like to remind users that if need to delete *all* rows, the truncate table option is available. – qxotk Sep 28 '16 at 15:35
  • 1
    assuming, there is no FK's pointing to this table, this solution is perfect, thank you! – Steve V Dec 16 '19 at 18:43
  • 1
    @BrianDriscoll can always delete constraints temporarily and recreate later. in order to use truncate. https://stackoverflow.com/questions/3843806/disabling-foreign-key-constraint-still-cant-truncate-table-sql-server-2005 – tinker Jun 04 '21 at 16:26
3

Not sure how good this would be but what if you do like below (provided table_1 is a stand alone table; I mean no referenced by other table)

  1. create a duplicate table of table_1 like table_1_dup

  2. insert into table_1_dup select * from table_1 where condition1 <> 'value';

  3. drop table table_1

  4. sp_rename table_1_dup table_1

Rahul
  • 76,197
  • 13
  • 71
  • 125
  • 1
    Under what circumstances would a lone database table ever exist except in the simplest college examples? Why even use a relational database if your tables won't contain relational data? – Scuba Steve Aug 09 '18 at 23:19
3

If you cannot afford to get the database out of production while repairing, do it in small batches. See also: How to efficiently delete rows while NOT using Truncate Table in a 500,000+ rows table

If you are in a hurry and need the fastest way possible:

  • take the database out of production
  • drop all non-clustered indexes and triggers
  • delete the records (or if the majority of records is bad, copy+drop+rename the table)
  • (if applicable) fix the inconsistencies caused by the fact that you dropped triggers
  • re-create the indexes and triggers
  • bring the database back in production
Community
  • 1
  • 1
Ruud Helderman
  • 10,563
  • 1
  • 26
  • 45