116

I want to insert 20000 records in a table by entity framework and it takes about 2 min. Is there any way other than using SP to improve its performance. This is my code:

 foreach (Employees item in sequence)
 {
   t = new Employees ();
   t.Text = item.Text;
   dataContext.Employees.AddObject(t);                  
 }
 dataContext.SaveChanges();
Appulus
  • 18,630
  • 11
  • 38
  • 46
Vahid Ghadiri
  • 3,966
  • 7
  • 36
  • 45
  • 3
    Check out this solution [on using SqlBulkCopy for Generic List ](http://elegantcode.com/2012/01/26/sqlbulkcopy-for-generic-listt-useful-for-entity-framework-nhibernate/). Works with code first POCOs and will be much much faster. – shox Mar 31 '12 at 23:29
  • I used this on some data and improved my insert time from longer than half an hour to ~15s (for around 50K rows). – shox Oct 15 '12 at 04:04
  • SqlBulkCopy is (and always has been) the fastest way to insert records into SQL Server, I think the implementation I've provided in the answer below is a better implementation than @dubbreak's. The gotcha I've described, would also apply to that code. – Mick Feb 10 '14 at 23:47

11 Answers11

220

There is opportunity for several improvements (if you are using DbContext):

Set:

yourContext.Configuration.AutoDetectChangesEnabled = false;
yourContext.Configuration.ValidateOnSaveEnabled = false;

Do SaveChanges() in packages of 100 inserts... or you can try with packages of 1000 items and see the changes in performance.

Since during all this inserts, the context is the same and it is getting bigger, you can rebuild your context object every 1000 inserts. var yourContext = new YourContext(); I think this is the big gain.

Doing this improvements in an importing data process of mine, took it from 7 minutes to 6 seconds.

The actual numbers... could not be 100 or 1000 in your case... try it and tweak it.

Romias
  • 13,783
  • 7
  • 56
  • 85
  • 16
    I did this and my data insert of 19,000 rows went from taking 20 minutes to taking less than 10 seconds – Stuntbeaver Apr 19 '13 at 16:14
  • 5
    40000 rows took about 4 seconds with this. I didnt renew the context, just used the config changes and saved every 1000. Awesome. – Lotok Aug 25 '13 at 21:18
  • 4
    I can confirm. This improves bulk import by a 100000%! – Tys Oct 13 '13 at 00:12
  • Just to mention that `AutoDetectChangesEnabled` and `ValidateOnSaveEnabled` are only options for `DbContext`, not for `ObjectContext` which is obviously used in the question. – Slauma Oct 14 '13 at 10:55
  • @Slauma, as I always use DbContext didn't know ObjectContext is different. I already read you answers here in SO so now I understand the differences. Thanks! – Romias Oct 14 '13 at 12:52
  • 1
    I don´t even need to save it in packages of 100s or 1000s of objects and I can see huge performance improvement. Using some magic number as a package size can be dangerous in production code, because it can work in your environment but not in customers one. Anyway works great – Ondra Nov 08 '13 at 16:27
  • 1
    I think I'm the only one, but made no difference for me. – devuxer Feb 20 '14 at 03:30
  • 1
    Thank you! This alone got me from 2 minutes down to 53 seconds. I then proceeded to collect a list of my entity and after 200 i generate a new context and transaction, call AddOrUpdate(myList.ToArray()); (yes it takes a range), and finally I clear my list. I proceed until I hit my maximum and am constantly newing up the context, in a using of course, and the results were astonishing. I also forced lazy loading in the constructor from the Context.tt file. This update takes 4 seconds now. I can't wait to test this against production sized data... which is in the hundreds of millions.... – Anthony Mason May 12 '14 at 01:12
  • I changed to it and used SQL profiler to check SQL transctions and found that it still issue one insert statement per transaction? – ca9163d9 Aug 29 '14 at 04:40
  • @dc7a9163d9, the only way to this method issue several inserts at once is adding several items to the context and then call SaveChanges(). – Romias Aug 29 '14 at 12:52
  • 1
    533 seconds to 15 seconds, and the 15 seconds is spend on getting data from upstream! – Burak Tamtürk Jul 03 '15 at 09:08
  • 1
    The whole performance optimization came from both configuration flags. Doing the work by slots doesn't improve my app's performance. – Daniel Argüelles Jan 26 '16 at 16:06
  • I had a process that took 4 minutes to complete and just by using this it came down to 1 minute. Thank you very much. – Nick Masao Apr 19 '17 at 11:13
39

There is no way to force EF to improve performance when doing it this way. The problem is that EF executes each insert in separate round trip to the database. Awesome isn't it? Even DataSets supported batch processing. Check this article for some workaround. Another workaround can be using custom stored procedure accepting table valued parameter but you need raw ADO.NET for that.

Ladislav Mrnka
  • 360,892
  • 59
  • 660
  • 670
  • 6
    You can also check my answer, there is space to improve performance. – Romias Apr 19 '13 at 16:44
  • 13
    I'm not sure why this is the accepted answer as it is blatantly false. There are ways to improve performance when doing large inserts with EF. Romias mentions one of them; another is to wrap all the inserts in a single transaction scope. If those options still are not performing well enough for you (you likely have other issues if that is the case), you can grab the Connection object from the `context` and use it with a `SQLBulkCopy` object to load the data. – Zac Howland Aug 13 '13 at 17:11
25

Using the code below you can extend the partial context class with a method that will take a collection of entity objects and bulk copy them to the database. Simply replace the name of the class from MyEntities to whatever your entity class is named and add it to your project, in the correct namespace. After that all you need to do is call the BulkInsertAll method handing over the entity objects you want to insert. Do not reuse the context class, instead create a new instance every time you use it. This is required, at least in some versions of EF, since the authentication data associated with the SQLConnection used here gets lost after having used the class once. I don't know why.

This version is for EF 5

public partial class MyEntities
{
    public void BulkInsertAll<T>(T[] entities) where T : class
    {
        var conn = (SqlConnection)Database.Connection;

        conn.Open();

        Type t = typeof(T);
        Set(t).ToString();
        var objectContext = ((IObjectContextAdapter)this).ObjectContext;
        var workspace = objectContext.MetadataWorkspace;
        var mappings = GetMappings(workspace, objectContext.DefaultContainerName, typeof(T).Name);

        var tableName = GetTableName<T>();
        var bulkCopy = new SqlBulkCopy(conn) { DestinationTableName = tableName };

        // Foreign key relations show up as virtual declared 
        // properties and we want to ignore these.
        var properties = t.GetProperties().Where(p => !p.GetGetMethod().IsVirtual).ToArray();
        var table = new DataTable();
        foreach (var property in properties)
        {
            Type propertyType = property.PropertyType;

            // Nullable properties need special treatment.
            if (propertyType.IsGenericType &&
                propertyType.GetGenericTypeDefinition() == typeof(Nullable<>))
            {
                propertyType = Nullable.GetUnderlyingType(propertyType);
            }

            // Since we cannot trust the CLR type properties to be in the same order as
            // the table columns we use the SqlBulkCopy column mappings.
            table.Columns.Add(new DataColumn(property.Name, propertyType));
            var clrPropertyName = property.Name;
            var tableColumnName = mappings[property.Name];
            bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping(clrPropertyName, tableColumnName));
        }

        // Add all our entities to our data table
        foreach (var entity in entities)
        {
            var e = entity;
            table.Rows.Add(properties.Select(property => GetPropertyValue(property.GetValue(e, null))).ToArray());
        }

        // send it to the server for bulk execution
        bulkCopy.BulkCopyTimeout = 5 * 60;
        bulkCopy.WriteToServer(table);

        conn.Close();
    }

    private string GetTableName<T>() where T : class
    {
        var dbSet = Set<T>();
        var sql = dbSet.ToString();
        var regex = new Regex(@"FROM (?<table>.*) AS");
        var match = regex.Match(sql);
        return match.Groups["table"].Value;
    }

    private object GetPropertyValue(object o)
    {
        if (o == null)
            return DBNull.Value;
        return o;
    }

    private Dictionary<string, string> GetMappings(MetadataWorkspace workspace, string containerName, string entityName)
    {
        var mappings = new Dictionary<string, string>();
        var storageMapping = workspace.GetItem<GlobalItem>(containerName, DataSpace.CSSpace);
        dynamic entitySetMaps = storageMapping.GetType().InvokeMember(
            "EntitySetMaps",
            BindingFlags.GetProperty | BindingFlags.NonPublic | BindingFlags.Instance,
            null, storageMapping, null);

        foreach (var entitySetMap in entitySetMaps)
        {
            var typeMappings = GetArrayList("TypeMappings", entitySetMap);
            dynamic typeMapping = typeMappings[0];
            dynamic types = GetArrayList("Types", typeMapping);

            if (types[0].Name == entityName)
            {
                var fragments = GetArrayList("MappingFragments", typeMapping);
                var fragment = fragments[0];
                var properties = GetArrayList("AllProperties", fragment);
                foreach (var property in properties)
                {
                    var edmProperty = GetProperty("EdmProperty", property);
                    var columnProperty = GetProperty("ColumnProperty", property);
                    mappings.Add(edmProperty.Name, columnProperty.Name);
                }
            }
        }

        return mappings;
    }

    private ArrayList GetArrayList(string property, object instance)
    {
        var type = instance.GetType();
        var objects = (IEnumerable)type.InvokeMember(property, BindingFlags.GetProperty | BindingFlags.NonPublic | BindingFlags.Instance, null, instance, null);
        var list = new ArrayList();
        foreach (var o in objects)
        {
            list.Add(o);
        }
        return list;
    }

    private dynamic GetProperty(string property, object instance)
    {
        var type = instance.GetType();
        return type.InvokeMember(property, BindingFlags.GetProperty | BindingFlags.NonPublic | BindingFlags.Instance, null, instance, null);
    }
}

This version is for EF 6

public partial class CMLocalEntities
{
    public void BulkInsertAll<T>(T[] entities) where T : class
    {
        var conn = (SqlConnection)Database.Connection;

        conn.Open();

        Type t = typeof(T);
        Set(t).ToString();
        var objectContext = ((IObjectContextAdapter)this).ObjectContext;
        var workspace = objectContext.MetadataWorkspace;
        var mappings = GetMappings(workspace, objectContext.DefaultContainerName, typeof(T).Name);

        var tableName = GetTableName<T>();
        var bulkCopy = new SqlBulkCopy(conn) { DestinationTableName = tableName };

        // Foreign key relations show up as virtual declared 
        // properties and we want to ignore these.
        var properties = t.GetProperties().Where(p => !p.GetGetMethod().IsVirtual).ToArray();
        var table = new DataTable();
        foreach (var property in properties)
        {
            Type propertyType = property.PropertyType;

            // Nullable properties need special treatment.
            if (propertyType.IsGenericType &&
                propertyType.GetGenericTypeDefinition() == typeof(Nullable<>))
            {
                propertyType = Nullable.GetUnderlyingType(propertyType);
            }

            // Since we cannot trust the CLR type properties to be in the same order as
            // the table columns we use the SqlBulkCopy column mappings.
            table.Columns.Add(new DataColumn(property.Name, propertyType));
            var clrPropertyName = property.Name;
            var tableColumnName = mappings[property.Name];
            bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping(clrPropertyName, tableColumnName));
        }

        // Add all our entities to our data table
        foreach (var entity in entities)
        {
            var e = entity;
            table.Rows.Add(properties.Select(property => GetPropertyValue(property.GetValue(e, null))).ToArray());
        }

        // send it to the server for bulk execution
        bulkCopy.BulkCopyTimeout = 5*60;
        bulkCopy.WriteToServer(table);

        conn.Close();
    }

    private string GetTableName<T>() where T : class
    {
        var dbSet = Set<T>();
        var sql = dbSet.ToString();
        var regex = new Regex(@"FROM (?<table>.*) AS");
        var match = regex.Match(sql);
        return match.Groups["table"].Value;
    }

    private object GetPropertyValue(object o)
    {
        if (o == null)
            return DBNull.Value;
        return o;
    }

    private Dictionary<string, string> GetMappings(MetadataWorkspace workspace, string containerName, string entityName)
    {
        var mappings = new Dictionary<string, string>();
        var storageMapping = workspace.GetItem<GlobalItem>(containerName, DataSpace.CSSpace);
        dynamic entitySetMaps = storageMapping.GetType().InvokeMember(
            "EntitySetMaps",
            BindingFlags.GetProperty | BindingFlags.Public | BindingFlags.Instance,
            null, storageMapping, null);

        foreach (var entitySetMap in entitySetMaps)
        {
            var typeMappings = GetArrayList("EntityTypeMappings", entitySetMap);
            dynamic typeMapping = typeMappings[0];
            dynamic types = GetArrayList("Types", typeMapping);

            if (types[0].Name == entityName)
            {
                var fragments = GetArrayList("MappingFragments", typeMapping);
                var fragment = fragments[0];
                var properties = GetArrayList("AllProperties", fragment);
                foreach (var property in properties)
                {
                    var edmProperty = GetProperty("EdmProperty", property);
                    var columnProperty = GetProperty("ColumnProperty", property);
                    mappings.Add(edmProperty.Name, columnProperty.Name);
                }
            }
        }

        return mappings;
    }

    private ArrayList GetArrayList(string property, object instance)
    {
        var type = instance.GetType();
        var objects = (IEnumerable)type.InvokeMember(
            property, 
            BindingFlags.GetProperty | BindingFlags.Public | BindingFlags.Instance, null, instance, null);
        var list = new ArrayList();
        foreach (var o in objects)
        {
            list.Add(o);
        }
        return list;
    }

    private dynamic GetProperty(string property, object instance)
    {
        var type = instance.GetType();
        return type.InvokeMember(property, BindingFlags.GetProperty | BindingFlags.Public | BindingFlags.Instance, null, instance, null);
    }

}

And finally, a little something for you Linq-To-Sql lovers.

partial class MyDataContext
{
    partial void OnCreated()
    {
        CommandTimeout = 5 * 60;
    }

    public void BulkInsertAll<T>(IEnumerable<T> entities)
    {
        entities = entities.ToArray();

        string cs = Connection.ConnectionString;
        var conn = new SqlConnection(cs);
        conn.Open();

        Type t = typeof(T);

        var tableAttribute = (TableAttribute)t.GetCustomAttributes(
            typeof(TableAttribute), false).Single();
        var bulkCopy = new SqlBulkCopy(conn) { 
            DestinationTableName = tableAttribute.Name };

        var properties = t.GetProperties().Where(EventTypeFilter).ToArray();
        var table = new DataTable();

        foreach (var property in properties)
        {
            Type propertyType = property.PropertyType;
            if (propertyType.IsGenericType &&
                propertyType.GetGenericTypeDefinition() == typeof(Nullable<>))
            {
                propertyType = Nullable.GetUnderlyingType(propertyType);
            }

            table.Columns.Add(new DataColumn(property.Name, propertyType));
        }

        foreach (var entity in entities)
        {
            table.Rows.Add(properties.Select(
              property => GetPropertyValue(
              property.GetValue(entity, null))).ToArray());
        }

        bulkCopy.WriteToServer(table);
        conn.Close();
    }

    private bool EventTypeFilter(System.Reflection.PropertyInfo p)
    {
        var attribute = Attribute.GetCustomAttribute(p, 
            typeof (AssociationAttribute)) as AssociationAttribute;

        if (attribute == null) return true;
        if (attribute.IsForeignKey == false) return true; 

        return false;
    }

    private object GetPropertyValue(object o)
    {
        if (o == null)
            return DBNull.Value;
        return o;
    }
}
Måns Tånneryd
  • 503
  • 4
  • 9
  • 3
    anybody know why when I try this I get an error referencing `EntitySetMaps`:"Method 'System.Data.Entity.Core.Mapping.EntityContainerMapping.EntitySetMaps' not found." – cjb110 Nov 13 '14 at 10:27
  • Which version of EF are you using? – Måns Tånneryd Nov 14 '14 at 11:24
  • ah oops, the ef 6 ver of your code and 6.1.1 according to nuget. And I'm using code first. The 'slow' method works ok. – cjb110 Nov 14 '14 at 11:55
  • 1
    They might have changed some of the metadata property names in 6.1.1. I'll check that out. – Måns Tånneryd Nov 14 '14 at 19:24
  • @cjb110, I think some of the methods changed for 6.1.x. EntitySetMappings seems to work for me – Chris Woolum Oct 25 '16 at 23:03
  • 3
    @MånsTånneryd thanks! I use EF 6.1.3, it's true that property names have changed. so I changed the **GetMappings()** like: EntitySetMaps to **EntitySetMappings** ;Types to **EntityTypes**; ;MappingFragments to **Fragments** ;AllProperties to **PropertyMappings** ;EdmProperty to **Property** ;ColumnProperty to **Column** – DarkMFJ Mar 24 '17 at 03:16
7

Maybe this answer here will help you. Seems that you want to dispose of the context periodically. This is because the context gets bigger and bigger as the attached entities grows.

Community
  • 1
  • 1
MemeDeveloper
  • 6,457
  • 2
  • 42
  • 58
5

There are two major performance issues with your code:

  • Using Add method
  • Using SaveChanges

Using Add method

The Add method becomes only slower and slower at each entity you add.

See: http://entityframework.net/improve-ef-add-performance

For example, adding 10,000 entities via:

  • Add (take ~105,000ms)
  • AddRange (take ~120ms)

Note: Entities has not been saved yet in the database!

The problem is that the Add method tries to DetectChanges at every entity added while AddRange does it once after all entities have been added to the context.

Common solutions are:

  • Use AddRange over Add
  • SET AutoDetectChanges to false
  • SPLIT SaveChanges in multiple batches

Using SaveChanges

Entity Framework has not been created for Bulk Operations. For every entity you save, a database round-trip is performed.

So, if you want to insert 20,000 records, you will perform 20,000 database round-trip which is INSANE!

There are some third-party libraries supporting Bulk Insert available:

  • Z.EntityFramework.Extensions (Recommended)
  • EFUtilities
  • EntityFramework.BulkInsert

See: Entity Framework Bulk Insert library

Be careful, when choosing a bulk insert library. Only Entity Framework Extensions support all kind of associations and inheritance, and it's the only one still supported.


Disclaimer: I'm the owner of Entity Framework Extensions

This library allows you to perform all bulk operations you need for your scenarios:

  • Bulk SaveChanges
  • Bulk Insert
  • Bulk Delete
  • Bulk Update
  • Bulk Merge

Example

// Easy to use
context.BulkSaveChanges();

// Easy to customize
context.BulkSaveChanges(bulk => bulk.BatchSize = 100);

// Perform Bulk Operations
context.BulkDelete(customers);
context.BulkInsert(customers);
context.BulkUpdate(customers);

// Customize Primary Key
context.BulkMerge(customers, operation => {
   operation.ColumnPrimaryKeyExpression = 
        customer => customer.Code;
});

EDIT: Answer Question in Comment

Is there a recommend max size for each bulk insert for the library you created

Not too high, not too low. There isn't a particular value that fit in all scenarios since it depends on multiple factors such as row size, index, trigger, etc.

It's normally recommended to be around 4000.

Also is there a way to tie it all in one transaction and not worry about it timing out

You can use Entity Framework transaction. Our library uses the transaction if one is started. But be careful, a transaction that takes too much time come also with problems such as some row/index/table lock.

Jonathan Magnan
  • 10,874
  • 2
  • 38
  • 60
  • Is there a recommend max size for each bulk insert for the library you created? Also is there a way to tie it all in one transaction and not worry about it timing out? Thanks! – darewreck Dec 21 '17 at 08:25
4

Better way is to skip the Entity Framework entirely for this operation and rely on SqlBulkCopy class. Other operations can continue using EF as before.

That increases the maintenance cost of the solution, but anyway helps reduce time required to insert large collections of objects into the database by one to two orders of magnitude compared to using EF.

Here is an article that compares SqlBulkCopy class with EF for objects with parent-child relationship (also describes changes in design required to implement bulk insert): How to Bulk Insert Complex Objects into SQL Server Database

Zoran Horvat
  • 10,924
  • 3
  • 31
  • 43
  • What happens with foreign key problems or unique key collisions? Is the entire operation rolled back? – Dan Esparza Apr 18 '14 at 15:11
  • 1
    Consider bulk insert a business transaction, rather than system transaction. Your question should be passed to business owner to decide. I've seen different options in practice, and all are equally good for us programmers: (1) rollback all and make user correct the data; (2) commit up to the point and notify user that the rest wasn't processed, (3) skip and continue and then notify user with records that failed. Solution 2 and 3 require some joggling with exceptions and are generally not trivial to implement. – Zoran Horvat Apr 26 '14 at 17:21
4

In Azure environment with Basic website that has 1 Instance.I tried to insert a Batch of 1000 records at a time out of 25000 records using for loop it took 11.5 min but in parallel execution it took less than a minute.So I recommend using TPL(Task Parallel Library).

         var count = (you collection / 1000) + 1;
         Parallel.For(0, count, x =>
        {
            ApplicationDbContext db1 = new ApplicationDbContext();
            db1.Configuration.AutoDetectChangesEnabled = false;

            var records = members.Skip(x * 1000).Take(1000).ToList();
            db1.Members.AddRange(records).AsParallel();

            db1.SaveChanges();
            db1.Dispose();
        });
  • Let me clarify this code: Row 1: `var count = (your collections.Count / 1000) + 1;` Row 7: **members** is your collection either to.. When i run this code for my case, i got this error _Transaction (Process ID 80) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction._ – Abdurrahman I. Apr 24 '17 at 07:32
  • For the case of exceptions which may occur I would rather put dbContext creation and disposal into using block – michal.jakubeczy Oct 13 '17 at 07:47
4

Currently there is no better way, however there may be a marginal improvement by moving SaveChanges inside for loop for probably 10 items.

int i = 0;

foreach (Employees item in sequence)
{
   t = new Employees ();
   t.Text = item.Text;
   dataContext.Employees.AddObject(t);   

   // this will add max 10 items together
   if((i % 10) == 0){
       dataContext.SaveChanges();
       // show some progress to user based on
       // value of i
   }
   i++;
}
dataContext.SaveChanges();

You can adjust 10 to be closer to better performance. It will not greatly improve speed but it will allow you to show some progress to user and make it more user friendly.

Akash Kava
  • 39,066
  • 20
  • 121
  • 167
3

Try using Bulk Insert....

http://code.msdn.microsoft.com/LinqEntityDataReader

If you have a collection of entities e.g storeEntities you can store them using SqlBulkCopy as follows

        var bulkCopy = new SqlBulkCopy(connection);
        bulkCopy.DestinationTableName = TableName;
        var dataReader = storeEntities.AsDataReader();
        bulkCopy.WriteToServer(dataReader);

There is one gotcha with this code. Make sure that the Entity Framework definition for the entity correlates exactly with the table definition, ensure that the Entity's properties are in the same order in the Entity Model as the columns in the SQL Server table. Failure to do this will result in an exception.

Mick
  • 6,527
  • 4
  • 52
  • 67
0

Although a late reply, but I'm posting the answer because I suffered the same pain. I've created a new GitHub project just for that, as of now, it supports Bulk insert/update/delete for Sql server transparently using SqlBulkCopy.

https://github.com/MHanafy/EntityExtensions

There're other goodies as well, and hopefully, It will be extended to do more down the track.

Using it is as simple as

var insertsAndupdates = new List<object>();
var deletes = new List<object>();
context.BulkUpdate(insertsAndupdates, deletes);

Hope it helps!

Mahmoud Hanafy
  • 1,103
  • 12
  • 12
-2
 Use : db.Set<tale>.AddRange(list); 
Ref :
TESTEntities db = new TESTEntities();
List<Person> persons = new List<Person> { 
new  Person{Name="p1",Place="palce"},
new  Person{Name="p2",Place="palce"},
new  Person{Name="p3",Place="palce"},
new  Person{Name="p4",Place="palce"},
new  Person{Name="p5",Place="palce"}
};
db.Set<Person>().AddRange(persons);
db.SaveChanges();
RINESH KM
  • 1
  • 3