Why is performance so bad on this EF data import method?

Question

I have a loop, abridged here, that performs an import of employee records, as follows:

var cts2005 = new Cts2005Entities();
IEmployeeRepository repository = new EmployeeRepository();

foreach (var c in cts2005.Candidates)
{
    var e = new Employee();

    e.RefNum = c.CA_EMP_ID;
    e.TitleId = GetTitleId(c.TITLE);
    e.Initials = c.CA_INITIALS;
    e.Surname = c.CA_SURNAME;

    repository.Insert(e);
}

There are actually several more fields, and a total of nine lookups like GetTitleId(c.TITLE) above. Code for these is all exactly like this:

private List<Title> _titles;
private Guid GetTitleId(string titleName)
{
    ITitleRepository repository = new TitleRepository();
    if (_titles == null)
    {
        _titles = repository.ListAll().ToList();
    }
    var title = _titles.FirstOrDefault(t => String.Compare(t.Name, titleName, StringComparison.OrdinalIgnoreCase) == 0);
    if (title == null)
    {
        title = new Title { Name = titleName };
        repository.Insert(title);
        _titles.Add(title);
    }
    return title.Id;
}

All repository.Insert() calls look like this, except the entity types differ:

public void Insert(Employee entity)
{
    CurrentDbContext.Employees.Add(entity);
    CurrentDbContext.SaveChanges();
}

And all PK's are Guid. I know this could be a small problem, but I didn't expect it to have such a large effect with small volumes like this.

I have done no tuning or optimization on this routine yet, as it was only for my small test db, but yesterday I was forced to do a surprise import of 6000 records. Toward the end, this processed had slowed to about 1 sec per record, which is quite dismal. I wouldn't have expected high speeds without some tuning, but nothing as low as that.

Is there anything obviously, grossly wrong with my method here?

Entity Framework is not designed or intended to be used for bulk imports. For that, you should investigate the [SqlBulkCopy](http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx) class in .NET. Also: if you insist on using EF for bulk insert, **do not** call `.SaveChanges()` for every entity - insert x entities one after another, and then call `.SaveChanges()` **once** for that batch... — marc_s, May 26 '12 at 10:10
@marc_s Thanks, that is something I have on my 'todo' list here. My problem with a bulk copy is the logic needed for all the FK lookups though. I receive a text value, and must insert an ID value on the main imported table. — ProfK, May 26 '12 at 10:18
Setting `AutoDetectChanges` to `false` and calling `SaveChanges` once for larger batches (as marc_s said) is the key to improve the performance: http://stackoverflow.com/a/5942176/270591 — Slauma, May 26 '12 at 11:47
Although EF is not made for bulk imports, 1s per insert is clearly wrong. — usr, May 26 '12 at 12:07

usr · Answer 1 · 2012-05-27T10:57:44.897

1

Your GetTitleId is pulling all Titles from the database once down into the application, but it is doing a linear search over all of them for each "Candidate". That is likely to be very expensive. Use a client-side hashtable with a StringComparer.OrdinalIgnoreCase.

Also, why don't you profile your application? Put load on it and hit break in the debugger 10 times. Where does it stop most of the time? This is the hot-spot.

edited May 27 '12 at 10:57

answered May 26 '12 at 12:07

usr

168,620
35
240
369

`GetTitleId` is only pulling all titles down once. – ProfK May 27 '12 at 08:30

score 1 · Accepted Answer · answered Jul 10 '12 at 16:15

1

Thanks to the comment from marc_s above, I changed the routine to only call SaveChanges every 500 inserts, and the speed improved with about 500%.

answered Jul 10 '12 at 16:15

ProfK

49,207
121
399
775

Why is performance so bad on this EF data import method?

2 Answers2