32

I have to import about 30k rows from a CSV file to my SQL database, this sadly takes 20 minutes.

Troubleshooting with a profiler shows me that DbSet.Add is taking the most time, but why?

I have these Entity Framework Code-First classes:

public class Article
{
    // About 20 properties, each property doesn't store excessive amounts of data
}

public class Database : DbContext
{
    public DbSet<Article> Articles { get; set; }
}

For each item in my for loop I do:

db.Articles.Add(article);

Outside the for loop I do:

db.SaveChanges();

It's connected with my local SQLExpress server, but I guess there isn't anything written till SaveChanges is being called so I guess the server won't be the problem....

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
Tamara Wijsman
  • 12,198
  • 8
  • 53
  • 82
  • 1
    Hello. Did you get rid of Entity Framework or used sqlbulkcopy together with EF? I am getting the exact same issue with .Add() – Kervin Ramen Mar 29 '11 at 08:10
  • 7
    If you set these: `db.Configuration.ValidateOnSaveEnabled = false; db.Configuration.AutoDetectChangesEnabled = false;` There is a huge performance gain. You must be sure of your values tough. – Kervin Ramen Mar 29 '11 at 11:19
  • Use backticks ( ` ) for code in comments. Looks interesting, I'll look into those properties later... – Tamara Wijsman Mar 29 '11 at 11:24

5 Answers5

47

As per Kevin Ramen's comment (Mar 29) I can confirm that setting db.Configuration.AutoDetectChangesEnabled = false makes a huge difference in speed

Running Add() on 2324 items by default ran 3min 15sec on my machine, disabling the auto-detection resulted in the operation completing in 0.5sec.

http://blog.larud.net/archive/2011/07/12/bulk-load-items-to-a-ef-4-1-code-first-aspx

Rudi
  • 3,124
  • 26
  • 35
  • This is an awesome thing to know!! Fixed a huge problem I had inserting 4k records using EF without redoing my code to use bulk copy. I think bulk copy is an easy answer that people do without analyzing the problem further. In my case the sql-insert was taking <1s and the EF add was taking 30-40 seconds so this workaround works perfectly. Thanks for the information! – Alex Dec 06 '11 at 20:37
23

I'm going to add to Kervin Ramen's comment by saying that if you are only doing inserts (no updates or deletes) then you can, in general, safely set the following properties before doing any inserts on the context:

DbContext.Configuration.AutoDetectChangesEnabled = false;
DbContext.Configuration.ValidateOnSaveEnabled = false;

I was having a problem with a once-off bulk import at my work. Without setting the above properties, adding about 7500 complicated objects to the context was taking over 30 minutes. Setting the above properties (so disabling EF checks and change tracking) reduced the import down to seconds.

But, again, I stress only use this if you are doing inserts. If you need to mix inserts with updates/deletes you can split your code into two paths and disable the EF checks for the insert part and then re-enable the checks for the update/delete path. I have used this approach succesfully to get around the slow DbSet.Add() behaviour.

spoida
  • 2,655
  • 1
  • 23
  • 14
  • This is certainly amazing and I might try this out and compare it to bulk inserts. Thank you for sharing! Also thanks for reminding, seems I forgot about that comment but I'll look into this tomorrow evening for sure... – Tamara Wijsman Oct 17 '12 at 00:12
  • Upon trying, this seems slower than bulk insertions so I can't use this approach. In detail I'm doing 350.000 `.Add()`s (entities with no references to other entities, just fields with reasonable values) followed by a `.SaveChanges()`, setting these to false before calling adds or save changes and back to true after saving changes; takes a lot longer than the bulk inserts so I don't even bother to leave it running. – Tamara Wijsman Oct 18 '12 at 16:05
  • I can't believe it. This made my day and my boss will be happy. Works like a charm :) – Alireza Dec 13 '16 at 06:25
10

Each item in a unit-of-work has overhead, as it must check (and update) the identity manager, add to various collections, etc.

The first thing I would try is batching into, say, groups of 500 (change that number to suit), starting with a fresh (new) object-context each time - as otherwise you can reasonably expect telescoping performance. Breaking it into batches also prevents a megalithic transaction bringing everything to a stop.

Beyond that; SqlBulkCopy. It is designed for large imports with minimal overhead. It isn't EF though.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • +1 if applicable in your design I would definitely go with SqlBulkCopy. – Enrico Campidoglio Dec 04 '10 at 21:10
  • I'm trying to accomplish something with this now, but I wonder if it will just accept matching based on the Column names and not on its properties... – Tamara Wijsman Dec 04 '10 at 21:26
  • 2
    The groups suggestion made it a little faster but it didn't work fast enough. After some iterations through nasty errors I got SqlBulkCopy working, it's nasty code though but it works. Might refactor it or check if they have support for bulk inserting later... Thank you Marc and the people on the chat which made a similar suggestion! And look, something that took **20 minutes** now takes **2 seconds**, it's magic... – Tamara Wijsman Dec 04 '10 at 22:29
  • @tomwij that is a pretty respectable result – Marc Gravell Dec 04 '10 at 22:49
  • This takes `SqlBulkCopy` and makes it more usable in an EF context. https://ruijarimba.wordpress.com/2012/03/25/bulk-insert-dot-net-applications-part1/ See also https://ruijarimba.wordpress.com/2012/03/18/entity-framework-get-mapped-table-name-from-an-entity/ – xan Dec 31 '14 at 17:47
5

There is an extremely easy to use and very fast extension here: https://efbulkinsert.codeplex.com/

It's called "Entity Framework Bulk Insert".

Extension itself is in namespace EntityFramework.BulkInsert.Extensions. So to reveal the extension method add using

using EntityFramework.BulkInsert.Extensions;

And then you can do this

context.BulkInsert(entities);

BTW - If you do not wish to use this extension for some reason, you could also try instead of running db.Articles.Add(article) for each article, to create each time a list of several articles and then use AddRange (new in EF version 6, along with RemoveRange) to add them together to the dbcontext.

ScottB
  • 1,363
  • 2
  • 14
  • 24
Arie Livshin
  • 865
  • 1
  • 12
  • 26
  • Unfortunately I get a `The given key was not present in the dictionary` error and there doesn't seem to be a great answer via [here](http://stackoverflow.com/a/26427216/845584) – PeterX Mar 03 '15 at 07:45
  • It skips the validation calls on each row, doing one validation at the end – ScottB Sep 14 '15 at 01:27
  • It's not about validation. The key difference is that `Add` does an expensive `DetectChanges` call internally each time. `AddRange` does it only once after having added all items. – Gert Arnold Apr 15 '23 at 19:42
1

I haven't really tried this, but my logic would be to hold on to ODBC driver to load file into datatable and then to use sql stored procedure to pass table to procedure.

For the first part, try: http://www.c-sharpcorner.com/UploadFile/mahesh/AccessTextDb12052005071306AM/AccessTextDb.aspx

For the second part try this for SQL procedure: http://www.builderau.com.au/program/sqlserver/soa/Passing-table-valued-parameters-in-SQL-Server-2008/0,339028455,339282577,00.htm

And create SqlCommnand object in c# and add to its Parameters collection SqlParameter that is SqlDbType.Structured

Well, I hope it helps.

Ivan Ičin
  • 9,672
  • 5
  • 36
  • 57