1

I have a loop, abridged here, that performs an import of employee records, as follows:

var cts2005 = new Cts2005Entities();
IEmployeeRepository repository = new EmployeeRepository();

foreach (var c in cts2005.Candidates)
{
    var e = new Employee();

    e.RefNum = c.CA_EMP_ID;
    e.TitleId = GetTitleId(c.TITLE);
    e.Initials = c.CA_INITIALS;
    e.Surname = c.CA_SURNAME;

    repository.Insert(e);
}

There are actually several more fields, and a total of nine lookups like GetTitleId(c.TITLE) above. Code for these is all exactly like this:

private List<Title> _titles;
private Guid GetTitleId(string titleName)
{
    ITitleRepository repository = new TitleRepository();
    if (_titles == null)
    {
        _titles = repository.ListAll().ToList();
    }
    var title = _titles.FirstOrDefault(t => String.Compare(t.Name, titleName, StringComparison.OrdinalIgnoreCase) == 0);
    if (title == null)
    {
        title = new Title { Name = titleName };
        repository.Insert(title);
        _titles.Add(title);
    }
    return title.Id;
}

All repository.Insert() calls look like this, except the entity types differ:

public void Insert(Employee entity)
{
    CurrentDbContext.Employees.Add(entity);
    CurrentDbContext.SaveChanges();
}

And all PK's are Guid. I know this could be a small problem, but I didn't expect it to have such a large effect with small volumes like this.

I have done no tuning or optimization on this routine yet, as it was only for my small test db, but yesterday I was forced to do a surprise import of 6000 records. Toward the end, this processed had slowed to about 1 sec per record, which is quite dismal. I wouldn't have expected high speeds without some tuning, but nothing as low as that.

Is there anything obviously, grossly wrong with my method here?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
ProfK
  • 49,207
  • 121
  • 399
  • 775
  • Entity Framework is not designed or intended to be used for bulk imports. For that, you should investigate the [SqlBulkCopy](http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx) class in .NET. Also: if you insist on using EF for bulk insert, **do not** call `.SaveChanges()` for every entity - insert x entities one after another, and then call `.SaveChanges()` **once** for that batch... – marc_s May 26 '12 at 10:10
  • @marc_s Thanks, that is something I have on my 'todo' list here. My problem with a bulk copy is the logic needed for all the FK lookups though. I receive a text value, and must insert an ID value on the main imported table. – ProfK May 26 '12 at 10:18
  • 2
    Setting `AutoDetectChanges` to `false` and calling `SaveChanges` once for larger batches (as marc_s said) is the key to improve the performance: http://stackoverflow.com/a/5942176/270591 – Slauma May 26 '12 at 11:47
  • Although EF is not made for bulk imports, 1s per insert is clearly wrong. – usr May 26 '12 at 12:07

2 Answers2

1

Your GetTitleId is pulling all Titles from the database once down into the application, but it is doing a linear search over all of them for each "Candidate". That is likely to be very expensive. Use a client-side hashtable with a StringComparer.OrdinalIgnoreCase.

Also, why don't you profile your application? Put load on it and hit break in the debugger 10 times. Where does it stop most of the time? This is the hot-spot.

usr
  • 168,620
  • 35
  • 240
  • 369
1

Thanks to the comment from marc_s above, I changed the routine to only call SaveChanges every 500 inserts, and the speed improved with about 500%.

ProfK
  • 49,207
  • 121
  • 399
  • 775