0

Using SqlBulkCopy I can insert a large number of rows in C# into SQL Server which is really very fast. I am in need of a similar thing for delete operations.

Is there anything available that will perform faster, high-performance delete operation in C#? Using a stored procedure with a list of row id as a parameter is not a good way. Calling the stored procedure many times is also not a good idea. So what is the best idea?

I need to delete around 30k rows from a table.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
jhon
  • 101
  • 11
  • 1
    EntityFramework-extensions is a library which can be used for bulk operations like this - https://entityframework-extensions.net/bulk-delete – Adam T Nov 13 '20 at 16:23
  • 5
    Deleting 30K rows in a SQL Server table is a job for an ordinary DELETE query, and it's already the fastest possible way. Find a WHERE clause that adequately encompasses your 30 thousand records. – Robert Harvey Nov 13 '20 at 16:23
  • Only 30k? What sort of performance are you seeing for doing it in a one? What sort of performance for batching it as 3x10k or 6x5k deletes? ps, if you want to empty a table, truncate it – Caius Jard Nov 13 '20 at 16:23
  • You can create a User Defined Data Type such as "IntegerTable", fill it on the client-side, pass it into a stored procedure that accepted an "IntegerTable", and issue a DELETE FROM Target INNER JOIN'ed on @IntegerTable. Or if using EF, Adam T posted a great example above. – Ross Bush Nov 13 '20 at 16:27
  • The fastest way of deleting rows is to use a Primary key so the look up of each rows is quick. You probably do not want to do this from c# since the interface between c# and SQL Server will add time. So I would either use SQLcmd.exe or Power Shell (which has SQLcmd.exe built in). See the command line utilities : https://learn.microsoft.com/en-us/sql/tools/command-prompt-utility-reference-database-engine?view=sql-server-ver15. You can run the method from inside a c# application. – jdweng Nov 13 '20 at 16:30
  • I know its not that much time, but in my case its a lot of time. User uploads file with 30k rows, after processing them I need to delete those rows from the table and need to provide a feedback. So I need to do it with minimal time. Right now its taking almost a minute while sqlbulkcopy insert method takes only a few seconds to insert. – jhon Nov 13 '20 at 16:37
  • Mark the uploaded records in some way so that, when you're ready to delete them, you can simply run a DELETE query for those records WHERE the mark exists. – Robert Harvey Nov 13 '20 at 16:48
  • If your records are time-stamped when you bulk-insert them, you can simply filter over the time stamp in the DELETE query. – Robert Harvey Nov 13 '20 at 16:50

5 Answers5

1

You can make an additional table with a column for your target IDs, and perhaps also a job ID. Then you can insert your IDs into this table. You'll end up with around 30k rows in the new table: one for each ID to delete. If your table's IDs are not the primary key, this is also the time to make that translation. You can let this part of the operation be a little slow if you need to, because it won't interfere with other parts of your database in terms of locks, or you can use a bulk insert technique, which you already seem comfortable with.

Once this table is populated, you can write a DELETE statement that includes a JOIN to this new table. In my experience, this will be the fastest way to complete your delete operation.

This doesn't sound like your situation, but when you really need it to run faster you can also try running the job during a period of lower load, or if you have a maintenance window you can switch to bulk-logged for a while, though this is an option of last resort.

After completing the query, truncate the extra table or delete rows with your job ID.

If this still isn't fast enough, you can improve things further by implementing logical deletes in your application. A logical delete is when you add a column in the table with a name like IsDeleted or DeletedDate. Then, to delete a row you only need to update the value of that column. You can also have a separate process that runs in the background to clean up these records after a certain amount of time if you want.

Logical deletes can be much faster for processing larger batches. However, it can mean re-writing significant parts of your application to understand the difference between a deleted record and an active record.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
0

You can use the MERGE instruction to massively delete data in SQL Server.

You need to specify primary keys of the table you want to delete from, and make it match on your actual table. Then you can decide what to do with matched or not matched records, in you case delete when match:

create table MyTable ( id int primary key, data varchar(5) )
insert into MyTable values (1, 'a'), (2, 'b'), (3, 'c')

merge MyTable as t
using
(
    select * from
    (
        values (1), (2)
    ) V(id)
) as x
on x.id = t.id
when matched then delete;

select * from MyTable -- only pk 3 left

This works very well when deleting large amount of data and it's enough easy to implement in C#, although it has not the same performance of SqlBulkCopy.

MERGE docs: https://learn.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql

Phate01
  • 2,499
  • 2
  • 30
  • 55
0

I would normally re-use a Custom Table Type as part of an extended CRUD set of stored procedures when this situation arises.

CREATE PROCEDURE MyTableDeletByPrimaryKeyInBulk(@PrimaryKeyTable IntegerIDTable)
AS 
    DELETE D
    FROM MyTable D
    INNER JOIN @PrimaryKeyTable P ON P.ID=D.TableID

Of course, the mileage may vary based on the number and types of indexes and how busy the server is. I ran this against a pretty loaded server. I set the maximum threshold to 100K records and it built up the data and performed the delete operation in less than 1 second.

DECLARE @RecordCount INT = 100000
;
WITH R1(N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(10))dt(n)),
R2(N) AS (SELECT 1 FROM R1 a, R1 b),
R3(N) AS (SELECT 1 FROM R2 a, R2 b), 
R4(N) AS (SELECT 1 FROM R3 a, R3 b),
Tally(Number) AS (SELECT  ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM R4)

INSERT
INTO _DeleteTest
SELECT Number,'Hey' FROM Tally
WHERE
    Number<=100000

DECLARE @T TABLE(ID INT)
INSERT INTO @T SELECT ID FROM _DeleteTest

DELETE D
FROM _DeleteTest D
INNER JOIN @T P ON P.ID=D.ID


(100000 rows affected)

(100000 rows affected)

(100000 rows affected)

Completion time: 2020-11-13T13:53:54.2883938-05:00
Ross Bush
  • 14,648
  • 2
  • 32
  • 55
  • You would do this for 30K deletes? How does the performance compare to just using DELETE, e.g. 30 delete statements each specifying 1000 primary keys to delete? Seems like unnecessary complexity. In addition to being simpler and probably just as fast, code that uses ordinary delete statements is DB-agnostic. – Eric J. Nov 13 '20 at 17:48
  • This demonstrates a way to get 30K+ PK's to the database. Once there, the deletion of records could be done however you see fit. For brevity, I have omitted any code that involves staging tables and SQL job, ETL queues, or any similar appliances as I felt it was a bit out of scope for a bulk delete question. – Ross Bush Nov 13 '20 at 18:38
  • 1
    @Eric J. -30K is not really a super large number to delete. In some cases, I will bump my bulk insert batch size to 15K and get a bigger boost out of that than using the normal something like 1000. – Ross Bush Nov 13 '20 at 18:58
0

I dont remember to see a BulkDelete, but according[Microsoft][1] https://social.technet.microsoft.com/wiki/contents/articles/20651.sql-server-delete-a-huge-amount-of-data-from-a-table.aspx [1]:

You just apply a truncate table TableName, it is like a Bulk and empty all record in the table, if is not like this follow the previus advice from @Phate01

Ruben
  • 139
  • 1
  • 3
  • 1
    That would be fine if you wanted to remove all records from the table, however, if you wanted to remove 100 out of 1000 records then truncate would not be a proper solution. – Ross Bush Nov 13 '20 at 19:04
0

Simply use smaller delete in loop works for me in "big delete" scenario

plus: this not lock all table for a long time.

cons: no benefit if used in a single transaction

--declare @i int =0
SELECT NULL --Makes the WHILE loop work.
WHILE @@ROWCOUNT <> 0
BEGIN
    --set @i=@i+1
    --print(@i)
    DELETE TOP (5000) FROM yourtable 
    WHERE ...
    
END

30K is a small number of rows for sql,i use this to delete milions of rows in huge table.

Coluld be some other problem like bad indexing, lock, heap table ecc.

Xilmiki
  • 1,453
  • 15
  • 22