6

I'm using this library to perform bulk delete in batches like following:

  while (castedEndedItems.Any())
  {
    var subList = castedEndedItems.Take(4000).ToList();
    DBRetry.Do(() => EFBatchOperation.For(ctx, ctx.SearchedUserItems).Where(r => subList.Any(a => a == r.ItemID)).Delete(), TimeSpan.FromSeconds(2));
    castedEndedItems.RemoveRange(0, subList.Count);
    Console.WriteLine("Completed a batch of ended items");
  }

As you can see guys I take a batch of 4000 items to delete at once and I pass them as argument to the query...

I'm using this library to perform bulk delete:

https://github.com/MikaelEliasson/EntityFramework.Utilities

However the performance like this is absolutely terrible... I tested the application couple of times and to delete the 80000 records for example it takes literally 40 minutes!?

I should note that that parameter by which I'm deleting (ItemID) is of varchar(400) type and it's indexed for performance reasons....

Is there any other library that I could possibly use or tweak this query to make it work faster, because currently the performance is absolutely terrible.. :/

User987
  • 3,663
  • 15
  • 54
  • 115
  • 1
    are you deleting everything? or just a subset of the rows? (i.e. would `truncate table` be an option?) – Marc Gravell Jan 28 '19 at 14:23
  • 2
    Use a SQL statement with Entity Framework raw queries. See [here](https://visualstudiomagazine.com/articles/2018/04/01/leveraging-raw-sql.aspx) for examples. Performance should be almost instantaneous. – Robert Harvey Jan 28 '19 at 14:23
  • 1
    @MarcGravell no as you can see I have a list where only matching items have to be deleted by parameter called "ItemID" - which is of varchar(250) type :) – User987 Jan 28 '19 at 14:23
  • 1
    @RobertHarvey I think SQLBulkCompy library implementation would be much better no ? SQLBulkCopy is made for these types of things.. I just wasn't able to find any with good implementation ^^ – User987 Jan 28 '19 at 14:25
  • 1
    Are you doing bulk copy or bulk delete? – Robert Harvey Jan 28 '19 at 14:25
  • @RobertHarvey bulk delete ^^ you can see from the query up above that I wrote :) – User987 Jan 28 '19 at 14:26
  • Then use a raw SQL query. SQL Server will hardly notice on 50,000 records. – Robert Harvey Jan 28 '19 at 14:26
  • @RobertHarvey Yes but notice that I have a list of ID's that need to be passed all at once ... – User987 Jan 28 '19 at 14:27
  • @RobertHarvey Also I'm not using .net core , but regular .net framework ^^ – User987 Jan 28 '19 at 14:27
  • Then use a subquery or Common Table Expression. Trust me, standard SQL is the way to go here. – Robert Harvey Jan 28 '19 at 14:27
  • Information on how to do raw queries in EF6 can be found here: https://learn.microsoft.com/en-us/ef/ef6/querying/raw-sql – Robert Harvey Jan 28 '19 at 14:28
  • In Entity Framework Core you can use `.RemoveRange(...)` You can read more here: https://learn.microsoft.com/en-us/ef/core/api/microsoft.entityframeworkcore.dbset-1 – Sebastian Siemens Jan 28 '19 at 14:28
  • @SebastianSiemens removerange performance is utterly terrible :/ – User987 Jan 28 '19 at 14:29
  • 1
    If you don’t need change tracking (i.e. to perform writes), add `.AsNoTracking()` to your query to ensure it will not be loaded into the context. The context attachment takes some time, and so does a call to `.SaveChanges()` if you have a couple of queries – one that requires changes and one that does not. This is exacerbated if there are a lot of results from those queries. – Sebastian Siemens Jan 28 '19 at 14:39
  • If you are simply changing a number of objects, you may not be aware that EF will check whether each of those new objects has changed. To avoid this, turn off `AutoDetectChangesEnabled` before chenging/deleting, then back on again afterwards. It does exactly what its name suggests. – Sebastian Siemens Jan 28 '19 at 14:41
  • Is your ItemId a foreign key elsewhere? It may be that you need better indexing in related tables.Your problem may not be any statement or library. – Anthony Pegram Jan 28 '19 at 16:32

3 Answers3

6

If you are prepared to use a stored procedure then you can do this without any external library:

  • Create the sproc using a table valued parameter @ids
  • Define a SQL type for that table valued parameter (just an id column assuming a simple PK)
  • In the sproc use

    delete from table where id in (select id from @ids);
    
  • In your application create a DataTable and populate to match the SQL table

  • Pass the data table as an command parameter when calling the sproc.

This answer illustrates the process.

Any other option will need to do the equivalent of this – or something less efficient.

Richard
  • 106,783
  • 21
  • 203
  • 265
  • There might be a limit on how many ids you can pass this way. – Robert Harvey Jan 28 '19 at 14:28
  • @RobertHarvey a TVP can have any number of rows; personally I'd probably still loop, though, as a *huge* batch could have trx issues – Marc Gravell Jan 28 '19 at 14:29
  • @RobertHarvey yes, but it is vastly more than the numbers talked about here (likely limited by the SQL Server batch size...). If that is an issue then do 10,000 at a time :-) – Richard Jan 28 '19 at 14:29
4

any EF solution here is probably going to perform lots of discreet operations. Instead, I would suggest manually building your SQL in a loop, something like:

using(var cmd = db.CreateCommand())
{
    int index = 0;
    var sql = new StringBuilder("delete from [SomeTable] where [SomeId] in (");
    foreach(var item in items)
    {
        if (index != 0) sql.Append(',');
        var name = "@id_" + index++;
        sql.Append(name);
        cmd.Parameters.AddWithValue(name, item.SomeId);            
    }
    cmd.CommandText = sql.Append(");").ToString();
    cmd.ExecuteNonQuery();
}

You may need to loop this in batches, though, as there is an upper limit on the number of parameters allowed on a command.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • Marc thx for the reply, I'd always prefer clean SQL statement instead of EF for performing batch deletions since it's much better... What do you think how much time would this take to delete 50000 records n table that totally has 180 million records? – User987 Jan 28 '19 at 14:31
  • @User987 Too much depends on the size of the records, whether the PK is clustered, what other indexes exist (their entries will need to be removed as well. And what kind of hardware you have, – Richard Jan 28 '19 at 14:33
  • P.S. The way how I match the items is by ItemID which is of type "Varchar"... I guess this causes performance bottleneck ? – User987 Jan 28 '19 at 14:34
  • @Marc Gravell so I'm guessing this would be even better than SQLBulkCopy ? – User987 Jan 28 '19 at 14:34
  • @Marc Gravell I've tried to use this piece of code but it says for example: "CreateCommand" doesn't exists in the context... – User987 Jan 28 '19 at 14:37
  • I prefer to do it with less statements, but I do also prefer to create a SQL statement with parameters instead of concatenating a string. – Sebastian Siemens Jan 28 '19 at 14:37
  • @User987 ultimately you need to get all the ids to the server, which will take some amount of bandwidth; as Richard notes, you could also use a table-valued-parameter to do the same thing - swings/roundabouts – Marc Gravell Jan 28 '19 at 14:42
  • @User987 in my code, I'm assuming that `db` is an ADO.NET `DbConnection` instance, not an EF context. You'll need a connection, and *that* most definitely has a `CreateCommand` method – Marc Gravell Jan 28 '19 at 14:56
  • Awesome code, I have been searching SO for nearly a year to find a Bulk delete and finally stumbled on this code. Thanks. – KyloRen Jul 13 '20 at 12:59
2

If you don't mind the extra dependency, you could use the NuGet package Z.EntityFramework.Plus.

The code is roughly as follows:

using Z.EntityFramework.Plus;
[...]
         using (yourDbContext context = new yourDbContext())
         {
              yourDbContext.yourDbSet.Where( yourWhereExpression ).Delete();
         }

It is simple and efficient. The documentation contains exact numbers about the performance.

Regarding licensing: As far as I know, version 1.8 has an MIT license: https://github.com/zzzprojects/EntityFramework-Plus/blob/master/LICENSE The newer version are not free to use.

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
tomwaitforitmy
  • 509
  • 3
  • 16
  • I'm aware of this one but this is commercial library and it's not a free one... It's highly expensive :) – User987 Jan 28 '19 at 14:44
  • As far as I know, there is one with MIT license: https://github.com/zzzprojects/EntityFramework-Plus/blob/master/LICENSE – tomwaitforitmy Jan 28 '19 at 14:49
  • wait really ? Since when did they change it to MIT licence :O – User987 Jan 28 '19 at 14:54
  • Holy shit it really is MIT licence.. I've read so much about that library that it's awesome... I had no idea they offer it for free now ?o.O – User987 Jan 28 '19 at 14:55
  • I think it's an older version, but the NuGet package of version 1.8 links this license. The new version from the project page is 3.1... So 1.8 might not have the best performance and the coolest code style, but it might be worth a try. – tomwaitforitmy Jan 28 '19 at 15:00