0

I am trying to insert a large number of rows (>10,000,000) into a MySQL database using EF ObjectContext (db-first). After reading the answer of this question I wrote this code (batch save) to insert about 10,000 contacts (30k rows actually; including related other rows):

// var myContactGroupId = ...;

const int maxContactToAddInOneBatch = 100;
var numberOfContactsAdded = 0;          

// IEnumerable<ContactDTO> contacts = ...

foreach (var contact in contacts)
{
    var newContact = AddSingleContact(contact); // method excerpt below

    if (newContact == null)
    {
        return;
    }

    if (++numberOfContactsAdded % maxContactToAddInOneBatch == 0)
    {
        LogAction(Action.ContactCreated, "Batch #" + numberOfContactsAdded / maxContactToAddInOneBatch);
        _context.SaveChanges();
        _context.Dispose();
        // _context = new ...
    }
}

// ...

private Contact AddSingleContact(ContactDTO contact)
{
    Validate(contact); // Simple input validations  

    // ...
    // ...

    var newContact = Contact.New(contact); // Creates a Contact entity

    // Add cell numbers
    foreach (var cellNumber in contact.CellNumbers)
    {
        var existingContactCell = _context.ContactCells.FirstOrDefault(c => c.CellNo == cellNumber);

        if (existingContactCell != null)
        {
            // Set some error message and return
            return;
        }

        newContact.ContactCells.Add(new ContactCell
        {
            CellNo = cellNumber,
        });
    }

    _context.Contacts.Add(newContact);

    _context.ContactsInGroups.Add(new ContactsInGroup
    {
        Contact = newContact,
        // GroupId =  some group id
    });

    return newContact;
}

But it seems that the more contacts are added (batchwise), it takes more time (non linear). Here is the log for batch size 100 (10k contacts). Notice the increasing time needed as the batch# increases:

12:16:48    Batch #1
12:16:49    Batch #2
12:16:49    Batch #3
12:16:50    Batch #4
12:16:50    Batch #5
12:16:50    Batch #6
12:16:51    Batch #7
12:16:52    Batch #8
12:16:53    Batch #9
12:16:54    Batch #10

...
...

12:21:26    Batch #89
12:21:32    Batch #90
12:21:38    Batch #91
12:21:44    Batch #92
12:21:50    Batch #93
12:21:57    Batch #94
12:22:03    Batch #95
12:22:10    Batch #96
12:22:16    Batch #97
12:22:23    Batch #98
12:22:29    Batch #99
12:22:36    Batch #100

It took 6 mins 48 sec. If I increase the batch size to 10,000 (requires a single batch), it takes about 26 sec (for 10k contacts). But when I try to insert 100k contacts (10k per batch), it takes a long time (for the increasing time per batch I guess).

Can you explain why it is taking increasing amount of time despite of the context being renew-ed? Is there any other idea except raw SQL?

Community
  • 1
  • 1
mshsayem
  • 17,557
  • 11
  • 61
  • 69
  • it might be related to garbage collection. Why dont you profile it? see where it is spending most of its time. – DarthVader Sep 16 '13 at 07:16

2 Answers2

0

Most answers on the question you linked to use context.Configuration.AutoDetectChangesEnabled = false; I don't see that in your example. So you should try that. You might want to consider EF6 too. It has an AddRange method on the context for this purpose see INSERTing many rows with Entity Framework 6 beta 1

Colin
  • 22,328
  • 17
  • 103
  • 197
  • 2
    that is for `DbContext`; this is `ObjectContext` which has no such property. – mshsayem Sep 16 '13 at 08:56
  • And the ObjectContext doesn't do automatic change detection. My bad...I guess raw sql is it then - especially if you are talking about 10,000,000 records. http://stackoverflow.com/a/14130937/150342 – Colin Sep 16 '13 at 09:31
0

Finally got it. Looks like the method Validate() was the culprit. It had an existence check query to check if the contact already existed. So, as the contacts are being added, the db grows and it takes more time to check as the batch# increases; mainly because the cell number field (which it was comparing) was not indexed.

mshsayem
  • 17,557
  • 11
  • 61
  • 69