12

I am inserting tens of thousands of objects into my Core Data entity. I have a single NSManagedObjectContext and I am calling save() on the managed object context every time I add an object. It works but while it is running, the memory keeps increasing from about 27M to 400M. And it stays at 400M even after the import is finished.

enter image description here

There are a number of SO questions about batch insert and everyone says to read Efficiently Importing Data, but it's in Objective-C and I am having trouble finding real examples in Swift that solve this problem.

Suragch
  • 484,302
  • 314
  • 1,365
  • 1,393

1 Answers1

29

There are a few things you should change:

  • Create a separate NSPrivateQueueConcurrencyType managed object context and do your inserts asynchronously in it.
  • Don't save after inserting every single entity object. Insert your objects in batches and then save each batch. A batch size might be something like 1000 objects.
  • Use autoreleasepool and reset to empty the objects in memory after each batch insert and save.

Here is how this might work:

let managedObjectContext = NSManagedObjectContext(concurrencyType: NSManagedObjectContextConcurrencyType.PrivateQueueConcurrencyType)
managedObjectContext.persistentStoreCoordinator = (UIApplication.sharedApplication().delegate as! AppDelegate).persistentStoreCoordinator // or wherever your coordinator is

managedObjectContext.performBlock { // runs asynchronously

    while(true) { // loop through each batch of inserts

        autoreleasepool {
            let array: Array<MyManagedObject>? = getNextBatchOfObjects()
            if array == nil { break }
            for item in array! {
                let newObject = NSEntityDescription.insertNewObjectForEntityForName("MyEntity", inManagedObjectContext: managedObjectContext) as! MyManagedObject
                newObject.attribute1 = item.whatever
                newObject.attribute2 = item.whoever
                newObject.attribute3 = item.whenever
            }
        }

        // only save once per batch insert
        do {
            try managedObjectContext.save()
        } catch {
            print(error)
        }

        managedObjectContext.reset()
    }
}

Applying these principles kept my memory usage low and also made the mass insert faster.

enter image description here

Further reading

  • Efficiently Importing Data (old Apple docs link is broken. If you can find it, please help me add it.)
  • Core Data Performance
  • Core Data (General Assembly post)

Update

The above answer is completely rewritten. Thanks to @Mundi and @MartinR in the comments for pointing out a mistake in my original answer. And thanks to @JodyHagins in this answer for helping me understand and solve the problem.

Community
  • 1
  • 1
Suragch
  • 484,302
  • 314
  • 1,365
  • 1,393
  • 1
    It seems in your code you are using the same managed object context, not a new one. – Mundi Aug 16 '15 at 11:50
  • The managed object context gets recreated at every `while` loop in my example above. The `while` loop represents one batch of inserts, so a single batch uses the same managed object context, but the next batch creates a new one. My problem in the past was that I made the context a class property and never changed it. – Suragch Aug 16 '15 at 12:02
  • 1
    @Suragch: That depends on how the `managedObjectContext` property is implemented in the Application delegate, but the "usual" implementation is a lazy property which creates the context once for the lifetime of the app. In that case you are reusing the same context as Mundi said. – Martin R Aug 16 '15 at 13:34
  • I wanted to ask more about the meaning of these comments so I opened a new question: [Where should NSManagedObjectContext be created?](http://stackoverflow.com/questions/32042637/where-should-nsmanagedobjectcontext-be-created) – Suragch Aug 17 '15 at 04:11
  • Fantastic - I have everything except the autorelease pool. Thanks. – Duncan Groenewald Jun 20 '18 at 14:28
  • hi can you help me with this https://stackoverflow.com/questions/50055953/how-to-speed-up-updating-relationship-among-tables-after-one-or-both-tables-are?noredirect=1#comment90842008_50055953 thanks – Iraniya Naynesh Aug 21 '18 at 11:20
  • @IraniyaNaynesh, I stopped using CoreData because I am developing for Android and iOS and it is easier to just use SQLite directly with both of them. – Suragch Aug 21 '18 at 11:44
  • "every block submitted through the perform(_:) method gets wrapped in a autorelease pool." You only need the autorelease pool for performAndWait. – Mycroft Canner Nov 28 '19 at 23:45