4

This is a follow-up from a previous question, in which I was trying to figure out the main cause for my code running slowly. I think I've narrowed it down to a minimal example below. I have a basic database structure as follows:

public class Foo
{
    public int Id { get; set; }
    public string Bar { get; set; }
}

public class FooContext : DbContext
{
    public DbSet<Foo> Foos { get; set; }
}

Now, if I had a list of Foo objects, and wanted to add them to the database, the suggested way would be to use AddRange(). But I noticed it was taking a long time, and is affected poorly by the number of items in the collection, even with a small amount like 200. So I wrote it out manually, and viola, it runs faster!

class Program
{
    static void Main(string[] args)
    {
        var foos = Enumerable.Range(0, 200).Select(index => new Foo { Bar = index.ToString() });

        // Make sure the timing doesn't include the first connection
        using (var context = new FooContext())
        {
            context.Database.Connection.Open();
        }

        var s1 = Stopwatch.StartNew();
        using (var context = new FooContext())
        {
            context.Foos.AddRange(foos);
            context.SaveChanges();
        }
        s1.Stop();

        var s2 = Stopwatch.StartNew();
        using (var context = new FooContext())
        {
            // Ignore the lack of sanitization, this is for demonstration purposes
            var query = string.Join(";\n", foos.Select(f => "INSERT INTO Foos ([Bar]) VALUES (" + f.Bar + ")"));
            context.Database.ExecuteSqlCommand(query);
        }
        s2.Stop();

        Console.WriteLine("Normal way: {0}", s1.Elapsed);
        Console.WriteLine("Hard way  : {0}", s2.Elapsed);
        Console.ReadKey();
    }
}

My initial thought was that Entity Framework might be using a separate transaction for each entry, but logging the SQL shows that's not the case. So why is there such a difference in execution time?

Andrew Williamson
  • 8,299
  • 3
  • 34
  • 62
  • 2
    Why are you guessing? Look at the statements actually issued with SQL Server Profiler. – Jeroen Mostert Jul 20 '17 at 05:46
  • context.parents.Add() does not even access the database so there's not a single transaction involved on this line. – dnickless Jul 20 '17 at 05:47
  • `SaveChanges` uses single transaction, but executes multiple SQL commands - one for each record `INSERT`, `UPDATE` or `DELETE`. – Ivan Stoev Jul 20 '17 at 05:50
  • How are you measuring your execution time (the async code might be a bit confusing here...) and how big are the differences? Are you running in release mode? Is there any slow Entity Framework logging turned on? – dnickless Jul 20 '17 at 05:55
  • 1
    @dnickless I switched it to synchronous to measure execution time, guess I should update the sample to match. The difference is a factor of _seconds_ for the standard EF way, and _hundredths of seconds_ for my alternative way. No, there is no logging turned on in this sample that might slow it down. – Andrew Williamson Jul 20 '17 at 19:44
  • To get a valid comparison you need to do a separate insert statement per item in foos, for a even closer comparison you need to do ExecuteSqlCommand per item too. Chaining them in a single statement like you did is not a fair comparision. – Scott Chamberlain Jul 20 '17 at 20:58
  • @ScottChamberlain I updated it to be a fair comparison, like you suggested, and it's still an order of magnitude faster – Andrew Williamson Jul 21 '17 at 03:23
  • 1
    Can you try to use a loop around the `ExecuteSqlCommand()` without the `string.Join()` but separate statements instead? Is there anything special on the tables your are dealing with? A clustered index perhaps? I am really surprised to see this hefty difference in execution time. It's got to be something pretty obvious... – dnickless Jul 21 '17 at 05:41
  • `SaveChanges` executes all the inserts within one transaction, so I thought the closest comparison would be to prepare the SQL, and execute it all within one `ExecuteSqlCommand`. There is no special indexing on the table - this basic example is enough to show the difference in execution time, and what you see is what you get (except the migration file, which you can generate yourself with EF). There's nothing special about it... – Andrew Williamson Jul 31 '17 at 22:52
  • Is proxy creation enabled? Does the performance change when it's disabled? – grek40 Jul 31 '17 at 23:09
  • I have tried disabling both proxy creation and change tracking. Yes, they make a slight difference, but it is miniscule in comparison to the difference when writing the raw sql. – Andrew Williamson Jul 31 '17 at 23:14

3 Answers3

5

While doing some research on your question I came across this enlightening article: http://www.codinghelmet.com/?path=howto/bulk-insert

Here's a quote:

Each object that was inserted required two SQL statements - one to insert a record, and additional one to obtain identity of the new record

This becomes a problem when inserting multiple records. A problem which is intensified by the fact that each record is inserted one at a time (But this is outside the context of your question since you're already testing the one by one insert). So if you're inserting 200 records, that's 400 sql statements being executed one by one.

So from my understanding EF is simply not built for bulk insertion. Even if it's as simple as inserting 200 records. Which to me seems like a big let down.

I started thinking, "Then what is EF good for anyway. It can't even insert a couple of records". Well i'll give EF props in two areas:

  1. Select Queries: It's very easy to write your queries and get data into your application quickly.
  2. Simplifying insertion of complex records. If you've ever had a table with a lot of foreign keys and you've tried inserting all the linked records in one transaction, you know what I'm talking about. Thankfully EF inserts each record in order and links all the related records in one single transaction for you. But as mentioned above, this comes at a cost.

So simply put, it seems like, if you have an operation that requires inserting a bunch of records, it might be best to use SqlBulkCopy. Which can insert thousands of records in seconds.

I know this might not be the answer you want to hear, because believe me it upsets me as well since I use EF alot, but I don't see any way around it

Sal
  • 5,129
  • 5
  • 27
  • 53
  • I would like to add that EF Core performs the inserts (and identity retrieval) much better than EF6 (with single db roundtrip). But the query translation currently is much worse than EF6. So in addition to the obvious transition costs, no full happiness at this moment :) – Ivan Stoev Aug 01 '17 at 16:53
  • EF seems to be lacking some key features to me - namely bulk insert, bulk update, and bulk delete. These are pretty common operations, and would be easy enough to implement if it weren't for navigation properties. – Andrew Williamson Aug 01 '17 at 23:11
0

This is just a guess, but have you tried running a second equivalent query through Entity Framework after already completing the first query, then timing that to see if it's closer to the raw SQL time?

This answer and others indicate that Entity Framework is slow to execute the first query because it has the overhead of building out the model. I don't know if this is exactly the problem you're seeing, but it seems possible. Either way, it could be useful to know if the second run-through is significantly faster than the first so that we either prove or rule out that possibility.

Kdawg
  • 1,508
  • 13
  • 19
  • Although the sample doesn't show it, I can rule out that possibility, because the production code that has this issue is being run continuously. I think @Paul's answer is probably the real reason. – Andrew Williamson Aug 01 '17 at 01:59
  • @Andrew Williamson It's very possible that his answer is correct, but I think it would still be worthwhile to check it out (and it would only take short time to prove/disprove it) and know for certain. If that does make a big difference, it may indicate that the issue you're seeing in your production code is something entirely different. – Kdawg Aug 01 '17 at 02:03
  • I know for certain that this is not the case. I have a separate production application, in which this segment of code is not the only query to be run, nor is it the first query to run, but it is consistently showing up in Application Insights as the bottleneck in the application. – Andrew Williamson Aug 01 '17 at 02:07
  • 1
    @Andrew Williamson Okay, but I still think it's kind of odd not to at least prove it/disprove it for your MVCE when it would take at most a few minutes to do so; at the very least, what you've added here in the comments would probably be useful information to include in your question. – Kdawg Aug 01 '17 at 02:14
-1

Since you can't live with it and can't live without it, have you considered calling SaveChangesAsync() instead?

I searched far and wide to find a way to disable primary key synchronization but could not find any for EF 6 and lesser.

EF core passes a true from DBContext.SaveChanges() to what I believe eventually triggers this synchronization. The other overload allows callers to pass false as the controlling parameter.

Tanveer Badar
  • 5,438
  • 2
  • 27
  • 32