1

In the first place, I am very aware that there are many other articles out there sharing the exact topic. However, those articles are old. For example,

Entity Framework is Too Slow. What are my options? this one is 4 years ago, and I assume the framework has released at least one major version update so I hope something new and helpful can help me out on this topic.

So my problem in detail is like this:

I have a data source that I have to parse and insert the results into the sql server. Since the parsing is so fast, I think the overhead is obviously related to entity framework.

I need to insert data into a few dozens of tables, around 80 of them. Each table has 8 columns, say, at average, among which there is an auto-incremental id column. And each time, I properly need to insert a few thousands rows into say, 20 tables. therefore some table are huge(maybe 100k rows?)

The tables have relation between them, which is simply a star form relation, like this:

                         table5  table6
                             \    /
           table3-----------\ \  / /-----------------table4
      table1 ------------ record table ------------------ table2 

just imagine the main table has 80 tables around it. 90% of the relation is 1-to-many, some many-to-many, limited one-to-one.

The way I do insertion is, I add all the result into context by using AddRange method, and I call SaveChanges once globally.

I observed my insertion efficiency in current settings, and I found at the very beginning, the insertion takes a few seconds, then minutes, now it takes 30 minutes to insert data roughly in the same size.

So my question is that is there any good way to boost the insertion speed up? I am aware of bulk insert, but I have inheritance relation so I am seeking any other way that is not going to break the relation constraint.

Would it help in this situation if I define stored procedure to serve the insertion purpose? I am sorry but actually I just learned it for weeks and expecting it is a right tool to use for business logic.

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
Jason Hu
  • 6,239
  • 1
  • 20
  • 41
  • 1
    Entity Framework, as an ORM, is adept at converting tables to data and visa versa. However, converting data from some data source into an object only to convert it again into another data store is terribly inefficient. There have been efforts put forth to try to improve this kind of activity, but most agree this isn't really the kind of thing that should be done in any kind of ORM. – Claies Sep 25 '15 at 02:51
  • @Claies yes, I agree with you, and no my boss don't agree with you. I am forced to parse some raw data and put it into RMDBS so I will need to map the data to relational data for him and other guys to query anyways. so please, scenario isn't that much important. I mean the data and relation construct here is reasonably simple. I hope I could get help on how to improve the performance instead of comments on the architecture, which i could change nothing. please focus on my problem only. – Jason Hu Sep 25 '15 at 02:57
  • @Claies FYI, the raw data is human readable text and I need to convert them into some relation, which is a pure painful job to do. so in such context, I totally agree with your point. – Jason Hu Sep 25 '15 at 02:59
  • 1
    well if I do a search for "Entity Framework Bulk Insert" here on SO, I find 154 results; On the first page alone, half of them say the same thing I have here, and multiples of them refer to this http://stackoverflow.com/questions/5940225/fastest-way-of-inserting-in-entity-framework. There's already plenty of information here. – Claies Sep 25 '15 at 03:05
  • you might just need to add `context.Configuration.AutoDetectChangesEnabled = false;`. Essentially, when EF is set to autodetect changes, it will try to update the object for each item in the range when something changes, like an `id` being generated by the server. If you turn off this change detection, then you won't get that `id` property in this instance of the object, but the next time you create a `dbContext` and query the `db`, the value will be populated. Just this one change can drastically improve performance. – Claies Sep 25 '15 at 03:11
  • @Claies thanks, I am aware of these. Bulk insert, as I said, has a limitation of updating one table only and it commits right away so I am not sure it fits my case. About auto detect, I didn't dare to use it since I don't know what is included in detection. But thanks to your delicate explanation, I will try. Any well done profiler I can use to further investigate? – Jason Hu Sep 25 '15 at 03:48

1 Answers1

1

You need to renew your context for each batch you do.

eg:

foreach(var batch in batches){
    using(var ctx = new MyContext()){
        //do a batch
        ctx.SaveChanges();
    }
}

If you try reusing the context you will still be tracking all of the previous entries you inserted. The tracking graph gets exponentially slower to process based on the number of tracked entities.

undefined
  • 33,537
  • 22
  • 129
  • 198
  • thanks. I am aware of this too. I have a manager that collect all the data to insert and insert them in one single shot and dispose itself. I don't reuse any context at all. so may i ask how do you probe the performance of EF and know where to improve? – Jason Hu Sep 25 '15 at 12:57
  • @HuStmpHrrr I would get out a profiler and check it out, I use a mix of SQL Profiler (built into SQL Server Management Studio) and EF Profiler (paid but very good and easy to use) – undefined Sep 28 '15 at 00:53