0

I'm using Entity Framework to build a database. There's two models; Workers and Skills. Each Worker has zero or more Skills. I initially read this data into memory from a CSV file somewhere, and store it in a dictionary called allWorkers. Next, I write the data to the database as such:

// Populate database
using (var db = new SolverDbContext())
{
   // Add all distinct skills to database
   db.Skills.AddRange(allSkills
      .Distinct(StringComparer.InvariantCultureIgnoreCase)
      .Select(s => new Skill
      {
         Reference = s
      }));

   db.SaveChanges(); // Very quick
   var dbSkills = db.Skills.ToDictionary(k => k.Reference, v => v);

   // Add all workers to database
   var workforce = allWorkers.Values
      .Select(i => new Worker
      {
         Reference = i.EMPLOYEE_REF,
         Skills = i.GetSkills().Select(s => dbSkills[s]).ToArray(),
         DefaultRegion = "wa",
         DefaultEfficiency = i.TECH_EFFICIENCY
      });

   db.Workers.AddRange(workforce);
   db.SaveChanges(); // This call takes 00:05:00.0482197
}

The last db.SaveChanges(); takes over five minutes to execute, which I feel is far too long. I ran SQL Server Profiler as the call is executing, and basically what I found was thousands of calls to:

INSERT [dbo].[SkillWorkers]([Skill_SkillId], [Worker_WorkerId])
VALUES (@0, @1)

There are 16,027 rows being added to SkillWorkers, which is a fair amount of data but not huge by any means. Is there any way to optimize this code so it doesn't take 5min to run?

Update: I've looked at other possible duplicates, such as this one, but I don't think they apply. First, I'm not bulk adding anything in a loop. I'm doing a single call to db.SaveChanges(); after every row has been added to db.Workers. This should be the fastest way to bulk insert. Second, I've set db.Configuration.AutoDetectChangesEnabled to false. The SaveChanges() call now takes 00:05:11.2273888 (In other words, about the same). I don't think this really matters since every row is new, thus there are no changes to detect.

I think what I'm looking for is a way to issue a single UPDATE statement containing all 16,000 skills.

Community
  • 1
  • 1
Mike Christensen
  • 88,082
  • 50
  • 208
  • 326
  • 3
    Have you tried searching? There's plenty of questions about bulk inserting in Entity Framework, see for example [Fastest Way of Inserting in Entity Framework](http://stackoverflow.com/questions/5940225/fastest-way-of-inserting-in-entity-framework). – CodeCaster Nov 09 '15 at 19:02
  • Possibly disabling autodetect changes: http://stackoverflow.com/questions/5943394/why-is-inserting-entities-in-ef-4-1-so-slow-compared-to-objectcontext/5943699#5943699 – GendoIkari Nov 09 '15 at 19:04
  • Looks like a duplicate of http://stackoverflow.com/questions/5940225/fastest-way-of-inserting-in-entity-framework to me. – JamieSee Nov 09 '15 at 19:06
  • Running again after setting `AutoDetectChangesEnabled` to false. We'll see how much of a difference that makes. – Mike Christensen Nov 09 '15 at 19:11
  • Still about the same. – Mike Christensen Nov 09 '15 at 19:21
  • 1
    [Here is a third party library that makes handling bulk inserts easier](https://efbulkinsert.codeplex.com/). – JD Davis Nov 09 '15 at 19:23
  • @Jdsfighter - Oh cool, that might be worth checking out! Yea I was thinking about just crafting the INSERT statement myself and running it manually.. – Mike Christensen Nov 09 '15 at 19:42
  • @Jdsfighter - Doh! Looks like that library won't work in my situation. `BulkInsert` doesn't also automatically insert referenced entities. I also don't have a model for `WorkerSkills` so I can't create a bunch of those to Bulk Insert. Bummer.. – Mike Christensen Nov 09 '15 at 20:12
  • I haven't used it but heard good things about [EntifyFrameworkExtensions](https://github.com/loresoft/EntityFramework.Extended). Give it a try!! – vendettamit Nov 09 '15 at 20:44
  • @vendettamit - Looks like they have batched updates and deletes which is great, but nothing to bulk insert. – Mike Christensen Nov 09 '15 at 21:06
  • @Jdsfighter - I reworked my models to support the BulkInsert extension method. The total time is now `00:00:01.1812081`, way cool! Feel free to add this as an answer and I'll accept it. – Mike Christensen Nov 09 '15 at 21:07

1 Answers1

1

One easy method is by using the EntityFramework.BulkInsert extension.

You can then do:

// Add all workers to database
var workforce = allWorkers.Values
   .Select(i => new Worker
   {
      Reference = i.EMPLOYEE_REF,
      Skills = i.GetSkills().Select(s => dbSkills[s]).ToArray(),
      DefaultRegion = "wa",
      DefaultEfficiency = i.TECH_EFFICIENCY
   });

db.BulkInsert(workforce);
Mike Christensen
  • 88,082
  • 50
  • 208
  • 326
JD Davis
  • 3,517
  • 4
  • 28
  • 61