11

I am trying to insert about 50.000 rows to MS Sql Server db via Entity Framework 6.1.3 but it takes too long. I followed this answer. Disabled AutoDetectChangesEnabled and calling SaveChanges after adding every 1000 entities. It still takes about 7-8 minutes. I tried this with a remote server and local server. There is not much difference. I don't think that this is normal. Am I forgot something?

Here is my code:

static void Main(string[] args)
    {

        var personCount = 50000;
        var personList = new List<Person>();
        var random = new Random();

        for (int i = 0; i < personCount; i++)
        {
            personList.Add(new Person
            {
                CreateDate = DateTime.Now,
                DateOfBirth = DateTime.Now,
                FirstName = "Name",
                IsActive = true,
                IsDeleted = false,
                LastName = "Surname",
                PhoneNumber = "01234567890",
                PlaceOfBirth = "Trabzon",
                Value0 = random.NextDouble(),
                Value1 = random.Next(),
                Value10 = random.NextDouble(),
                Value2 = random.Next(),
                Value3 = random.Next(),
                Value4 = random.Next(),
                Value5 = random.Next(),
                Value6 = "Value6",
                Value7 = "Value7",
                Value8 = "Value8",
                Value9 = random.NextDouble()
            });
        }

        MyDbContext context = null;

        try
        {
            context = new MyDbContext();
            context.Configuration.AutoDetectChangesEnabled = false;

            int count = 0;
            foreach (var entityToInsert in personList)
            {
                ++count;
                context = AddToContext(context, entityToInsert, count, 1000, true);
            }

            context.SaveChanges();
        }
        finally
        {
            if (context != null)
                context.Dispose();
        }

    }

    private static MyDbContext AddToContext(MyDbContext context, Person entity, int count, int commitCount, bool recreateContext)
    {
        context.Set<Person>().Add(entity);

        if (count % commitCount == 0)
        {
            context.SaveChanges();
            if (recreateContext)
            {
                context.Dispose();
                context = new MyDbContext();
                context.Configuration.AutoDetectChangesEnabled = false;
            }
        }

        return context;
    } 

Person class:

public class Person
{
    public int Id { get; set; }

    [MaxLength(50)]
    public string FirstName { get; set; }

    [MaxLength(50)]
    public string LastName { get; set; }

    public DateTime DateOfBirth { get; set; }

    [MaxLength(50)]
    public string PlaceOfBirth { get; set; }

    [MaxLength(15)]
    public string PhoneNumber { get; set; }

    public bool IsActive { get; set; }

    public DateTime CreateDate { get; set; }

    public int Value1 { get; set; }

    public int Value2 { get; set; }

    public int Value3 { get; set; }

    public int Value4 { get; set; }

    public int Value5 { get; set; }

    [MaxLength(50)]
    public string Value6 { get; set; }

    [MaxLength(50)]
    public string Value7 { get; set; }

    [MaxLength(50)]
    public string Value8 { get; set; }

    public double Value9 { get; set; }

    public double Value10 { get; set; }

    public double Value0 { get; set; }

    public bool IsDeleted { get; set; }
}

Query tracked from profiler:

exec sp_executesql N'INSERT [dbo].[Person]([FirstName], [LastName],       [DateOfBirth], [PlaceOfBirth], [PhoneNumber], [IsActive], [CreateDate],     [Value1], [Value2], [Value3], [Value4], [Value5], [Value6], [Value7], [Value8],     [Value9], [Value10], [Value0], [IsDeleted])
VALUES (@0, @1, @2, @3, @4, @5, @6, @7, @8, @9, @10, @11, @12, @13, @14, @15, @16, @17, @18)
SELECT [Id]
FROM [dbo].[Person]
WHERE @@ROWCOUNT > 0 AND [Id] = scope_identity()',N'@0 nvarchar(50),@1     nvarchar(50),@2 datetime2(7),@3 nvarchar(50),@4 nvarchar(15),@5 bit,@6 datetime2(7),@7 int,@8 int,@9 int,@10 int,@11 int,@12 nvarchar(50),@13 nvarchar(50),@14 nvarchar(50),@15 float,@16 float,@17 float,@18 bit',@0=N'Name',@1=N'Surname',@2='2017-01-19 10:59:09.9882591',@3=N'Trabzon',@4=N'01234567890',@5=1,@6='2017-01-19 10:59:09.9882591',@7=731825903,@8=1869842619,@9=1701414555,@10=1468342767,@11=1962019787,@12=N'Value6',@13=N'Value7',@14=N'Value8',@15=0,65330243467041405,@16=0,85324223938083377,@17=0,7146566792925152,@18=0

I want to solve this only with EF. I now there is plenty of alternatives. But lets assume that there is no other chances.

The main problem here is, I used same approach with answer I referenced. It inserts 560000 entities in 191 secs. But I can only insert 50000 in 7 minutes.

Community
  • 1
  • 1
gkonuralp
  • 463
  • 2
  • 11
  • 27
  • 1
    Did you try 'context.Configuration.ValidateOnSaveEnabled = false;'? – kgzdev Jan 19 '17 at 08:29
  • 1
    Could you show us the definition of person? Maybe there are many indexed columns? Why you need the AddToContext Method? Try to remove it. – SirBirne Jan 19 '17 at 08:31
  • Consider using https://efbulkinsert.codeplex.com/ or similar for big inserts. Another thing I did before was opening multiple Contexts and just insert everything in parallel. – Mats391 Jan 19 '17 at 08:33
  • Edited question. Will try them soon. (I do not want to use efbulkinsert) – gkonuralp Jan 19 '17 at 08:34
  • 1
    you can use Bulk Insert for this https://efbulkinsert.codeplex.com/ – K D Jan 19 '17 at 08:37
  • You can also try to re-create the context after each insert block. That way the `ChangeTracker` wont grow huge which should help and also reduce memory footprint. – Mats391 Jan 19 '17 at 08:37
  • 4
    ORMs in general and EF specifically are **NOT** meant to perform bulk operations. They are meant to work with entities. Bulk operations deal with raw data, not entities. If you want fast performance use [SqlBulkCopy](https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy(v=vs.110).aspx) – Panagiotis Kanavos Jan 19 '17 at 08:37
  • @Mats391 that will only *slow* things even more. In any case, if you don't want change tracking, just turn it off. – Panagiotis Kanavos Jan 19 '17 at 08:38
  • @Mats391 "parallel" insertions will also *slow* things, a lot, due to concurrency and IO conflicts on the server, updating indexes, acquiring and releasing locks on table and indexes etc. The problem is using an ORM in the first place. *Batching* multiple operations also helps, as instead of 1000 round trips you perform only 1. Nothing is faster though than actually using the appropriate mechanism for bulk inserts – Panagiotis Kanavos Jan 19 '17 at 08:40
  • Could someone more familiar with EF explain to me why the SELECT? Thanks. – dean Jan 19 '17 at 08:46
  • @dean `ID` is obviously an IDENTITY so EF has to generate a statement that returns the new ID. Another reason why using ORMs for bulk operations is a bad idea - with entities, you want to know the IDs. With bulk operations though, you don't care at all – Panagiotis Kanavos Jan 19 '17 at 08:59
  • Dont use EF for bulk insert! Make use of store procedure or BulkInsert command to insert bulk record. Still you want go with EF for bulk insert then refer EF extension which might be helpful in this case: https://efbulkinsert.codeplex.com/ one of exe – Ankush Madankar Jan 19 '17 at 09:08
  • @PanagiotisKanavos Thanks. These SELECTs are probably what kills the OP performance for the most part (as a combination of statistics rebuilds and table scans in between). Of course, bulk insert will win always. – dean Jan 19 '17 at 10:39
  • @dean no, they aren't. Simply using an ORM does. They *only* double the traffic -just a 2x increase. Executing multiple INSERT statements though results in N times more calls. Batching multiple requests takes care of that but still results in executing and logging N INSERT statements. A bulk operation though performs minimal logging for all the data, without sending anything back. This means a lot less IO for the transaction log – Panagiotis Kanavos Jan 19 '17 at 10:43
  • @gkon a BULK INSERT for 50K rows shouldn't take even 30". – Panagiotis Kanavos Jan 19 '17 at 10:48
  • @gkon row insertion speeds *can't* be compared. They are affected by the hardware performance, indexing and the server version. Indexes on a table have to be updated each time you enter a new row. If you need to load a lot of data fast, you need to *disable* the indexes first, insert the data then rebuild the indexes – Panagiotis Kanavos Jan 19 '17 at 10:53
  • @PanagiotisKanavos That different can not be caused by hardware if we are using todays computers I think. I created this solution only test and improve EF large quantity inserts. So I am not waiting for alternative advices. I know all of them and yes I can use them. But for now I am just trying to solve a problem. Taking 7 minutes of inserting 50.000 rows is not normal whichever ORM is used. if it would be 1 minutes, that would be good enough. – gkonuralp Jan 19 '17 at 11:04
  • @gkon of course it can! Single SSD, RAID or SAN? Partitioned tables or all on a single machine? Transaction log on the same disk as the database, thus stealing IOPS? Memory constraints causing spills to tempdb? Is tempdb on fast or slow storage. – Panagiotis Kanavos Jan 19 '17 at 11:07
  • @gkon but that doesn't take into account even more important factors - how many indexes? Is there any other traffic on the table? *How much data*? A wide row obviously requires more IO than a narrow row. There are a *LOT* of concerns that have nothing to do with objects and just can't be controlled or addressed through an ORM. PS I get to import that many rows, parsed from multiMB airline ticket files every 15 minutes. EF is only used on the web site, parsers ETL all use bulk operations and SSIS – Panagiotis Kanavos Jan 19 '17 at 11:09
  • Even uploading data, eg global list of airports, goes through SqlBulkCopy. BI into a temporary table, UPSERT the target table, finished in a couple of seconds. Try updating such a high-traffic table directly through an ORM. Timeouts guaranteed – Panagiotis Kanavos Jan 19 '17 at 11:12
  • Yes. But I am saying 560.000 in 191 secs vs 50.000 in 450 secs. This different is insane. I tried it on my personal computer and a full of throat server. Almost same result. Facts that are you mentioned are of course true. And if that would be a real project of course I use SqlBulkCopy. – gkonuralp Jan 19 '17 at 11:14
  • Which could mean that the problem is your own code and how fast it sends individual commands. Or you may be measuring the time it takes to reallocate your `List<>`. Or the cost of reallocating the context, which doesn't make much sense. Or it could be that your test code uses so much CPU and RAM that it delays the local database. You just can't compare speeds like this. – Panagiotis Kanavos Jan 19 '17 at 12:03

1 Answers1

21

You already got rid of the ChangeTracker problem by disabling AutoDetectChanges.

I normally recommend using one of theses solutions:

  1. AddRange over Add (Recommended)
  2. SET AutoDetectChanges to false
  3. SPLIT SaveChanges in multiple batches

See: http://entityframework.net/improve-ef-add-performance

Making multiple batches won't really improve or decrease performance since you already set AutoDectectChanges to false.

The major problem is that Entity Framework makes a database round-trip for every entity you need to insert. So, if you INSERT 50,000 entities, 50,000 database round-trips are performed which is INSANE.

What you need to do to solve your issue is reducing the number of database round-trips.

One free way to do it is using SqlBulkCopy: https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy(v=vs.110).aspx


Disclaimer: I'm the owner of Entity Framework Extensions

This library allows you to perform all bulk operations you need for your scenarios:

  • Bulk SaveChanges
  • Bulk Insert
  • Bulk Delete
  • Bulk Update
  • Bulk Merge

You will be able to insert 50,000 entities in a few seconds.

Example

// Easy to use
context.BulkSaveChanges();

// Easy to customize
context.BulkSaveChanges(bulk => bulk.BatchSize = 100);

// Perform Bulk Operations
context.BulkDelete(customers);
context.BulkInsert(customers);
context.BulkUpdate(customers);

// Customize Primary Key
context.BulkMerge(customers, operation => {
   operation.ColumnPrimaryKeyExpression = 
        customer => customer.Code;
});

Does your extension do that? Or do you know any other way of save many rows with geometry columns efficiently?

Yes, our library supports geometry columns.

Jonathan Magnan
  • 10,874
  • 2
  • 38
  • 60
  • where people find sql.data.sqlclient library? Some post suggests to install .NET Compact Framework. Is this mean this can only work against local db not remote database? – Shinya Koizumi Jul 27 '17 at 19:35
  • Hello @powerfade917, I'm not sure to understand your question. – Jonathan Magnan Jul 28 '17 at 04:39
  • I was wondering if SqlBulkCopy can be used in the database which is not in local since .net compact framework is mainly for mobile or tablet. – Shinya Koizumi Jul 29 '17 at 07:09
  • 1
    @powerfade917, no you cannot use SqlBulkCopy for SQL Compact. With SQL Compact, the fastest way to insert record is by using a DbDataReader && CreateRecord. Can can find an open source example here: https://sqlcebulkcopy.codeplex.com/ – Jonathan Magnan Jul 30 '17 at 16:46
  • 1
    Why you are not bought by Microsoft and that code shipped in native Entity Framework? Or why this faster updates are not part of the EF in the first place – Vlado Pandžić Jun 19 '20 at 14:34
  • @JonathanMagnan : your BulkInsert is an awesome solution but somehow it is paid and the company not ready to spend money for bulk insert custom dll. – Saroop Trivedi Nov 02 '20 at 23:59
  • @JonathanMagnan SqlBulkCopy doesn't support geometry columns. Does your extension do that? Or do you know any other way of save many rows with geometry columns efficiently? – Johan Jul 14 '21 at 12:20
  • 1
    In my case it increased inserting speed by 14 times! thanks man – MohammadHossein R Nov 19 '22 at 17:28