8

I'm inserting a lot of data into SQL Server CE 4.0 using Entity Framework 4.2 (code-first), and the performance is abysmal when compared to direct SQL insertion.

The model is very simple:

public class DocMember
{
    public DocMember() { this.Items = new List<DocItem>(); }

    public int Id { get; set; }

    public string Name { get; set; }
    public string MemberType { get; set; }
    public string AssemblyName { get; set; }

    public virtual IList<DocItem> Items { get; set; }
}

public class DocItem
{
    public int Id { get; set; }
    public DocMember Member { get; set; }
    public string PartType { get; set; }
    public string PartName { get; set; }
    public string Text { get; set; }
}

I have 2623 DocMembers and a total of of 7747 DocItems to insert, and I'm getting the following execution times:

With SQL: 00:00:02.8
With EF:  00:03:02.2

I can understand there's a bit of overhead with EF, but it is 65 times slower than SQL!

Perhaps there's a problem in my code, but it is quite straightforward and I can't see what could be wrong:

    private TimeSpan ImportMembersEF(IList<DocMember> members)
    {
        using (var db = new DocEntities())
        {
            db.Database.CreateIfNotExists();

            var sw = Stopwatch.StartNew();
            foreach (var m in members)
            {
                db.Members.Add(m);
            }

            db.SaveChanges();
            sw.Stop();
            return sw.Elapsed;
        }
    }

I also tried to call SaveChanges for each inserted item, or every 100 or 200 items, to no avail (it actually makes it worse).

Is there a way to improve the performance, or do I have to use SQL for batch inserts?


EDIT: for completeness, here's the code for the SQL insertion: http://pastebin.com/aeaC1KcB

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Thomas Levesque
  • 286,951
  • 70
  • 623
  • 758
  • Are the `Id`s auto generated on server? – Eranga Jan 23 '12 at 23:43
  • I have no idea if this has anything to do with the slow performance, but I'm wondering whether, according to convention, `public DocMemeber Member { get; set; }` should be `public virtual DocMember DocMember { get; set; }` and also whether there should be a member `public int DocMember DocMemberId { get; set; }`. – devuxer Jan 23 '12 at 23:52
  • @Eranga, yes. I didn't specify anything about this, it's the default behavior. – Thomas Levesque Jan 23 '12 at 23:59
  • @DanM, I tried to make the changes you suggest, but it's not better. Anyway, making this member virtual is only relevant when you load the data from the DB, not when you insert it. – Thomas Levesque Jan 24 '12 at 00:06
  • @Thomas, okay, figured that was a long shot, but looking at the code again, I have another thought: what would happen if you waited until all the `DocMember`s are added to the database before you hook up the `DocItem`s...then add the `DocItem`s? I'm thinking that each time it adds a member, it needs to run a query to figure out what the associated items are, and then add those. – devuxer Jan 24 '12 at 00:14
  • @DanM, I tried this as well, the result is much worse (more than 8 minutes) – Thomas Levesque Jan 24 '12 at 01:41
  • @Thomas, see this question: http://stackoverflow.com/questions/6107206/improving-bulk-insert-performance-in-entity-frame-work. I think what you're running into is unfortunately a fact of life for EntityFramework (at least for the current version). – devuxer Jan 24 '12 at 02:59
  • Well, it's a known fact and well documented that basically any ORM isn't really suited for **bulk operations**. An ORM shines when it comes to handling single objects, or smallish numbers of objects. That's why there are bulk operations (outside of the ORM space) to handle large numbers of rows to be inserted. – marc_s Jan 24 '12 at 06:01
  • 1
    I would try to not use autogenerated Id and I would try to use inserting each member with separate context - both just to diagnose the issue. – Ladislav Mrnka Jan 24 '12 at 09:14
  • @marc_s, unfortunately I suspect you are correct... so I guess I'm stuck with SQL – Thomas Levesque Jan 24 '12 at 09:45
  • It has surfaced that this was a bug up to EF5, it will be fixed for EF 6. See: http://stackoverflow.com/questions/14768394/exponentially-deteriorating-performance-on-inserts-in-sql-server-compact-4-0-tab – Dabblernl Mar 30 '13 at 15:59

2 Answers2

6

You can use my SqlCeBulkCopy library for loading bulk data, it mimics the SqlBulkCopy api: http://sqlcebulkcopy.codeplex.com

ErikEJ
  • 40,951
  • 5
  • 75
  • 115
  • Thanks, but it doesn't answer my question... I don't have performance issues if I do it in SQL (although it could probably be improved by using bulk copy), my problem is with Entity Framework – Thomas Levesque Jan 24 '12 at 09:21
  • Actually I just noticed something interesting: "the WriteToServer method also accepts a DataTable, an IEnumerable or and **IEnumerable**". Does it accept POCOs, or does it have to be EF generated entities? – Thomas Levesque Jan 24 '12 at 09:34
  • No, it can be any list of objects, ie POCOs – ErikEJ Jan 24 '12 at 12:28
  • Hi Erik, I just found the time to try this solution. It's really fast, but there's a problem: my IDs are auto-generated, and SqlCeBulkCopy doesn't update my entities with the generated values... Is there a way around it? – Thomas Levesque Feb 11 '12 at 23:27
  • Not sure aht you mean - you can set identity handling via options (BulkCopyOptions enum)? – ErikEJ Feb 12 '12 at 11:18
  • I don't set the ID explicitly on my entities, because this ID is generated by the database. The problem is that the Id property on my DocMember entities is not updated by SqlCeBulkCopy, so when I want to insert the DocItem entities, I don't know what the value of the associated DocMember is... Is there a recommended way of handling associations with bulk copy ? – Thomas Levesque Feb 12 '12 at 17:05
  • I managed to make it work, but I had to reload the DocMembers from the DB and join them with the list in memory (based on another unique field) to update the Id... Is there a better solution? – Thomas Levesque Feb 12 '12 at 17:15
  • Probably not, unless you use GUIDs as IDs instead, and assign them in code – ErikEJ Feb 13 '12 at 11:10
  • OK, thanks anyway. Even with this limitation, I guess it's still the best approach in my case... – Thomas Levesque Feb 13 '12 at 11:27
  • From 45 seconds to 3 seconds. Awesome! – devuxer Feb 20 '14 at 06:21
1

It's slow because it's not batching the insert.

When you insert using identity on the db, it has to select the result ID out after each item to assign to the model. This makes it really slow.

Your adhoc SQL isn't selecting the ID out so when batched you can submit all the statements at once.

Altho written with NHibernate:

http://www.philliphaydon.com/2011/09/the-benefits-of-letting-the-orm-generate-the-identity-part-1/

I wrote about using ORM generated IDs vs SQL Generated IDs.

Phill
  • 18,398
  • 7
  • 62
  • 102
  • My adhoc SQL is not batching the inserts either, and it does retrieve the ID... I'll update the question. – Thomas Levesque Jan 24 '12 at 00:17
  • Is it selecting the ID after each insert and assigning it to the model? – Phill Jan 24 '12 at 00:20
  • Interesting, that is definitely strange. I might crank open DotTrace when I get home and take a look at this. adhoc sql will always be faster but that much faster? Something strange is going on for EF to be that slow... – Phill Jan 24 '12 at 00:27