14

I have a database table (running on SQL Server 2012 Express) that contains ~ 60,000 rows.

I am using the following code to purge old rows:

//Deleting CPU measurements older than (oldestAllowedTime)
var allCpuMeasurementsQuery = from curr in msdc.CpuMeasurements where 
    curr.Timestamp < oldestAllowedTime select curr;
foreach (var cpuMeasurement in allCpuMeasurementsQuery)
{
  msdc.CpuMeasurements.Remove(cpuMeasurement);
}

When the number of deleted rows is large (~90% or more of the records in the tables are being deleted) the operation takes exceptionally long. It takes about 30 minutes to finish this operation on an relatively strong machine (Intel I5 desktop).

  1. does this seem like a normal behavior?

  2. any ideas about what I can do to reduce the operation's time?

Thanks,

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
OSH
  • 2,847
  • 3
  • 25
  • 46
  • 3
    Look at DeleteAllOnSubmit, it may help. – DavidB May 08 '13 at 13:44
  • 2
    AFAIK what you're doing is running 60,000+ delete commands agains the database. If you could instead run just one command, or batches of a few hundred commands or less, you wouldn't have that performance problem. – Geeky Guy May 08 '13 at 13:44
  • 4
    Entity Framework isn't great at that sort of thing. Might be best to create a stored procedure that you pass in the timestamp and it deletes all of the records that way. – Belogix May 08 '13 at 13:44
  • I typically wrap these in a single transaction. If that doesn't improve things I will call into a SPROC to do the work since that will avoid a lot of network traffic. – itsmatt May 08 '13 at 13:44
  • I second Belogix. Entity framework isn't great and handling large numbers of records at the best of times. I'd write an SP to deal with this, or execute a query. – Simon May 08 '13 at 13:45
  • 1
    Also, do you have an index on the Timestamp column? – itsmatt May 08 '13 at 13:48
  • What attempts of trying to optimize this query have you done? Have you thought of running this query in segments? – Security Hound May 08 '13 at 13:48
  • What kind of database is this? – RBarryYoung May 08 '13 at 13:51
  • @itsmatt Ouch. I'm ashamed to say that the timestamp is not an index in the table. I'm going to stand in the corner now... – OSH May 08 '13 at 14:26

4 Answers4

20

Entity framework is not very good at handling bulk operations like this. You should use ExecuteStoreCommand to execute SQL directly against the data source in situations like this.

var deleteOld = "DELETE FROM CpuMeasurements WHERE curr.Timestamp < {0}";
msdc.ExecuteStoreCommand(deleteOld, oldestAllowedTime);

By doing so you don't need to load the entities into memory (just to delete them) and issue thousands of delete commands to the database.

Magnus
  • 45,362
  • 8
  • 80
  • 118
  • 2
    I appreciate all the suggestions here. I am choosing this solution for now as it seems the easiest (at least to me). I do think that the other solutions mentioned here seem promising ( EntityFramework.Extended), and I will check them out when things calm down. – OSH May 08 '13 at 14:31
14

You should look at EntityFramework.Extended it was created to help with both bulk deletions and updates.

Using it, you could simply do:

msdc.CpuMeasurements.Delete(curr => curr.Timestamp < oldestAllowedTime);
Mathew Thompson
  • 55,877
  • 15
  • 127
  • 148
  • Good recommendation on Extended but it looks like it is unsupported now. The github page links to this alternative which looks pretty solid [Entity Framework Plus](https://entityframework-plus.net/?z=ef-extended) – Uriah Blatherwick Mar 14 '23 at 15:31
5

The reason for this is that you execute a DB update for every single record. You need to do a bulk update.

EntityFramework.extended can handle this scenario.

WhileTrueSleep
  • 1,524
  • 1
  • 19
  • 32
Kenneth
  • 28,294
  • 6
  • 61
  • 84
-1

Deleting huge amounts of data can take a long time.

You might have to move the sql out of your application and run it as a single sql script via SQL Server Agent. It could be run, for example, once a day during the quietest period.

Joe Ratzer
  • 18,176
  • 3
  • 37
  • 51