1

I've got a console line application that saves approximately 150,000 rows to the database through LINQ. This works fine (where I would normally expect it to stop working). It's a bog standard save changes call after reading data from the CSV file:-

 List<Invoice> oldInvoices = db.Invoices.Where(x => !x.IsVisible).ToList();
 List<int> oldInvoiceIDs = oldInvoices.Select(s => s.InvoiceID).ToList();

 List<InvoiceProduct> allInvoiceProducts = db.InvoiceProducts.ToList();
 List<InvoiceProduct> oldInvoiceProducts = allInvoiceProducts.Where(x => oldInvoiceIDs.Contains(x.InvoiceID)).ToList();

 db.InvoiceProducts.RemoveRange(oldInvoiceProducts);
 db.Invoices.RemoveRange(oldInvoices);

 UpdateConsole.WriteLine("Switching over invoices completed. Please wait...", ConsoleColor.Black, ConsoleColor.Magenta);

The table is a list of invoices with a sub-linked table of products against each invoice. Each time we get new data, we write in the new data, mark it in the database as not visible, then switch the currently visible data to invisible and the currently invisible data to visible, giving the effect of an immediate switch from one dataset to the next dataset. The dataset that just got marked as invisible then gets deleted through LINQ.

This takes its time to delete, but not an unreasonable amount of time. As this data comes from a CSV data file, we log the number of rows, and start and end date and time of the reading of the file. This is stored in another database table and the code to save is:-

importLog.SuccessfullyImportedRows = successfulRows;
importLog.FailedImportedRows = failedRows;
importLog.EndTime = DateTime.Now;

db.SaveChanges();

This save, takes in excess of 40 minutes and I've no idea why. The only thing I can think of is that it is using the same DBEntities class that is made available when generated the EDMX in Visual Studio?

Has anyone had this? It give the appearance of the application hanging, but it does continue after 40 minutes or so...

Mike Upjohn
  • 1,251
  • 2
  • 16
  • 38
  • Possible duplicate of [How can I run this task faster](http://stackoverflow.com/questions/37374480/how-can-i-run-this-task-faster) – Veverke May 31 '16 at 09:25
  • May help you http://stackoverflow.com/questions/37096509/why-getting-data-with-entity-framwork-is-slow – mohsen May 31 '16 at 09:29
  • Thanks for the links there. The latter, is regarding SaveChanges() inside of a loop, which mine isn't. It's also worth noting that when I save the 177,000 records, I save to a local list, and then do AddRange() and then SaveChanges(). I only call the SaveChanges() once for saving the list and a second time for saving the log. Can the same problem still exist without any looped SaveChanges() ? – Mike Upjohn May 31 '16 at 09:35

3 Answers3

1

You have multiple performance issues in your approach:

  1. Dragging unnecessary data from the database.
  2. Bulk Insert for Huge Records.
  3. Bulk Delete from Hude Records.

No need to drag all invoices from the database then filter them locally in memory, where you can query them directly in the database and retreive only the list that you want.

You need to replace this:

 List<InvoiceProduct> allInvoiceProducts = db.InvoiceProducts.ToList();
 List<InvoiceProduct> oldInvoiceProducts = allInvoiceProducts.Where(x => oldInvoiceIDs.Contains(x.InvoiceID)).ToList();

with:

List<InvoiceProduct> oldInvoiceProducts = db.InvoiceProducts.Where(x => oldInvoiceIDs.Contains(x.InvoiceID)).ToList();

For Bulk Delete faster approach:

 String commaDelimitedIds = String.Join(",", oldInvoiceIDs);
 String query = "DELETE FROM Invoice WHERE InvoiceID IN (" + commaDelimitedIds + ")";
 db.ExecuteQuery(query);

Inserting 150,000 recod via Linq To SQL is not a good idea, this will generate 150,000 Insert statement (No to mentions relation objects).

Have a look at this example : SQLBulkCopy which is ideal for huge inserts.

In General, ORMs are not a good idea for bulk operations.

Zein Makki
  • 29,485
  • 6
  • 52
  • 63
  • Hi and thanks for your reply. You are correct in saying that I was dragging unnecessary data from the database. The problem I had beforehand was that the third line of code after "with: " retrieved an error because .Contains() had too many integers in the list there. Hence I selected to a local list in an extra line and then did my approach above. Much appreciated for your help! I will give it a go. – Mike Upjohn May 31 '16 at 10:17
  • 1
    @MikeUpjohn You can split each 2000 ID together using a for loop, and `Concat` the lists together in the end. This is a workaround for the SQL-parameters limitation. – Zein Makki May 31 '16 at 10:20
1

First of all, in your query I see that has problems when using .toList(). toList means you force that query run immediately and store it into memory. It's faster for small data, but for over 150,000 rows, surely you will get the problem with performance and out of memmory with that. You can use AsQueryable() instead of.

AsQueryable just creates a query, the instructions needed to get a list. You can make futher changes to the query later such as adding new Where clauses that get sent all the way down to the database level.

For EF 6 or upper, RemoveRange has performance very fast. So I do not think RemoveRange is the root cause in here. But if you want to improve more performance, try to use this extension one. It's really nice. https://efbulkinsert.codeplex.com/

nguyenhoai890
  • 1,189
  • 15
  • 20
0

Ok, the solution I've found (though not sure on the reasoning). If I don't take the Entities item into the function for logging and I do a new instance of the Entity generated from the EDMX e.g.:-

using(DBEntities db = new DBEntities()) {
    importLog.SuccessfullyImportedRows = successfulRows;
    importLog.FailedImportedRows = failedRows;
    importLog.EndTime = DateTime.Now;

    db.SaveChanges();
}

This works in less than a second. Something being cached in the original instance of DBEntities from inserting so many rows?

Mike Upjohn
  • 1,251
  • 2
  • 16
  • 38