0

I'm trying to import around 900K rows of data and map it to my new data model. My problem is that the console application I have build for this import functionality slows down over time.

I have monitored the SQL queries and they all perform well (<5ms). I tried to import smaller chunks, fx 1K rows, at a time. The stopwatch timings looks like this:

  • Count: 100 | Avg ms: 36
  • Count: 200 | Avg ms: 67
  • Count: 300 | Avg ms: 106
  • Count: 400 | Avg ms: 145
  • Count: 500 | Avg ms: 183
  • Count: 600 | Avg ms: 222
  • Count: 700 | Avg ms: 258
  • Count: 800 | Avg ms: 299
  • Count: 900 | Avg ms: 344
  • Count: 1000 | Avg ms: 376

When restarting the application with a new chunk of 1K rows, the timings are similar.

The import data has the following format:

public class ImportData
{
    public int Id { get; set; }
    public int TaxpayerId { get; set; }
    public string CustomerName { get; set; }
    public string Email { get; set; }
    public string PhoneNumber { get;set; }
}

A simplified example of my data model looks like this:

public class Channel
{
    public int Id { get; set; }
    public string Name { get; set; }
}

public class Permission
{
    public Guid Id { get; set; }
    public Channel Channel { get; set; }
    public string Recipient { get; set; }
}

public class Taxpayer
{
    public Guid Id { get; set; }
    public int TaxpayerId { get; set; }
    public string Name { get; set; }
    public List<Permission> Permissions { get; set; }        
}

My import method looks like this:

public void Import()
{
    Stopwatch stopwatch = new Stopwatch();

    //Get import data
    List<ImportData> importDataList = _dal.GetImportData();

    stopwatch.Start();

    for (int i = 0; i < importDataList.Count; i++)
    {
        ImportData importData = importDataList[i];

        Taxpayer taxpayer = new Taxpayer()
        {
            Name = importData.CustomerName,
            TaxpayerId = importData.TaxpayerId,
            Permissions = new List<Permission>()
        };
        //Does not call SaveChanges on the context
        CreateTaxpayer(taxpayer, false); 

        //Create permissions
        if (!string.IsNullOrWhiteSpace(importData.Email))
        {
            //Does not call SaveChanges on the context
            CreatePermission(_channelIdEmail, importData.Email, taxpayer, PermissionLogType.PermissionRequestAccepted);
        }
        if (!string.IsNullOrWhiteSpace(importData.PhoneNumber))
        {
            //Does not call SaveChanges on the context
            CreatePermission(_channelIdPhoneCall, importData.PhoneNumber, taxpayer, PermissionLogType.PermissionRequestAccepted);
            //Does not call SaveChanges on the context
            CreatePermission(_channelIdSms, importData.PhoneNumber, taxpayer, PermissionLogType.PermissionRequestAccepted);
        }

        if ((i + 1) % 100 == 0)
        {
            Console.WriteLine("Count: " + (i + 1) + " | Avg ms: " + stopwatch.ElapsedMilliseconds / 100);
            stopwatch.Restart();
        }
    }
    _dal.SaveChanges();
}

I tried the following:

  • reduce the number of calls to SaveChanges (only called at once at the end)
  • implement multithreading (without luck) - it does not seem to go hand in hand with entity framework

I am running out of ideas here. Do you guys have any suggestions to solve this performance issue?

Thomas D
  • 89
  • 5
  • Please create a [mcve]. The code you show has no obvious issues. I would suspect you do a Linq (either to Entities or to Objects) query somewhere that becomes progressively slower as you add items to a collection. – CodeCaster Feb 10 '17 at 10:00
  • 1
    *Don't* use an ORM. Use SqlBulkCopy. ORMs aren't meant for batch operations, it's like using tweezers to move truckloads of rocks. You end up tracking *all* records, sending individual INSERT requests for *each* insertion. You *could* disable change tracking and add an extension that adds batching to EF so that multiple requests are sent together, but you still gain nothing from using EF over SqlBulkCopy. – Panagiotis Kanavos Feb 10 '17 at 10:04
  • Possible duplicate of [SqlBulkCopy and Entity Framework](http://stackoverflow.com/questions/2553545/sqlbulkcopy-and-entity-framework) – Panagiotis Kanavos Feb 10 '17 at 10:07
  • You need to dispose your _dal object for each iteration (or even better, use a new one for each iteration) - also improve perf by using gcServer in app.config - but best approach is to NOT use EF, but SqlBulkCopy – ErikEJ Feb 10 '17 at 10:20
  • 1
    Changes to the context object, tracking and batching would only give percentage benefits. SqlBulkCopy would result in orders of magnitude improvements. It's not only that `BULK INSERT` over a stream of rows is far faster, it also uses minimal logging. The transaction log doesn't record every single INSERT statement. It only logs enough data to rollback if needed. This results in a *LOT* less disk IO, log usage and fragmentation – Panagiotis Kanavos Feb 10 '17 at 10:31
  • Thanks so far, guys. I'll have a look at SqlBulkCopy! – Thomas D Feb 10 '17 at 10:56

1 Answers1

0

Why don't you use BulkCopy, this code will need amending for your particular tables & columns, but hopefully you get the idea:

using (var bulkCopy = new SqlBulkCopy(_DbContext.Database.Connection.ConnectionString, SqlBulkCopyOptions.TableLock))
            {
               bulkCopy.BulkCopyTimeout = 1200; // 20 minutes
                bulkCopy.BatchSize = 10000;
                bulkCopy.DestinationTableName = "TaxPayer";

                var table = new DataTable();
                var props = TypeDescriptor.GetProperties(typeof(TaxPayer))                                     
                    //Dirty hack to make sure we only have system data types                                      
                    //i.e. filter out the relationships/collections
                    .Cast<PropertyDescriptor>()
                    .Where(propertyInfo => propertyInfo.PropertyType.Namespace.Equals("System"))
                    .ToArray();
                foreach (var propertyInfo in props) 
                { 
                    bulkCopy.ColumnMappings.Add(propertyInfo.Name, propertyInfo.Name); 
                    table.Columns.Add(propertyInfo.Name, Nullable.GetUnderlyingType(propertyInfo.PropertyType) ?? propertyInfo.PropertyType); 
                }

                // need to amend next line to account for the correct number of columns
                var values = new object[props.Length + 1];
                foreach (var item in importDataList) 
                { 
                    for (var i = 0; i < values.Length - 1; i++) 
                    { 
                        ///TODO: Decide which columns need including
                        values[i] = props[i].GetValue(item);
                    }
                    table.Rows.Add(values);
                }

                bulkCopy.WriteToServer(table);
            }
Steve Ford
  • 7,433
  • 19
  • 40