NSPersistentContainer & NSFetchedResultsController with large data set

Question

We recently switched our app over to using NSPersistentContainer to set up our Core Data stack. The removal of boilerplate such as automatic consuming of save notifications and merge handling was appealing to us, plus it's supposed to be set up to be very efficient.

However, we're facing an issue when importing large data sets. Let me start by telling you that our data model is rather complex - lots of one-to-many relationships and entities.

The Core Data stack used to be set up to use a private queue NSManagedObjectContext, attached to the NSPersistentStoreCoordinator, to perform the persistence on a background queue. The main queue context would then be a child of this context; private queue contexts created as children of the main queue context to handle saves. A fairly standard set up before the invention of NSPersistentContainer.

However, when we started to notice that as our data sets grew larger, profiling the app would show us that Core Data was taking up a lot of CPU time on the main thread. Switching to NSPersistentContainer seemed to remedy this. A lot less activity on the main thread. We presume it's because there is less traffic going through the main queue (as NSPersistentContainer's background queues vended by newBackgroundQueue() are set up to save directly to the store coordinator; they aren't children of the main queue context).

This appeared to be all well and good, until the data set grew. We noticed that when processing around 15,000 records (sometimes with up to 10-15,000 objects that are related to these records), upon saving the background context if an NSFetchedResultsController was set up to observe these objects, the UI would hang. Badly. For up to 1 minute. Obviously this is not desirable.

Here's how our persistent container is set up:

...
    public init(storeURL: URL, modelName: String, configureStoreDescriptionHandler: ((NSPersistentStoreDescription, NSManagedObjectModel) -> ())? = nil) throws {
        guard let modelURL = Bundle.main.url(forResource: modelName, withExtension: "momd") else { throw StackError.modelNotFound }
        guard let model = NSManagedObjectModel(contentsOf: modelURL) else { throw StackError.modelNotCreated }

        let storeDescription = NSPersistentStoreDescription(url: storeURL)
        storeDescription.type = NSSQLiteStoreType

        configureStoreDescriptionHandler?(storeDescription, model)

        storeDescription.shouldMigrateStoreAutomatically = true
        storeDescription.shouldInferMappingModelAutomatically = true
        storeDescription.shouldAddStoreAsynchronously = false

        container = NSPersistentContainer(name: modelName, managedObjectModel: model)
        container.persistentStoreDescriptions = [storeDescription]

        var outError: StackError?
        container.loadPersistentStores { (storeDescription, error) in
            if let error = error {
                assertionFailure("Unable to load \(storeDescription) because \(error)")
                outError = .storeNotMigrated
            }
        }

        if let error = outError {
            throw error
        }

        container.viewContext.automaticallyMergesChangesFromParent = true
    }

    public var mainQueueManagedObjectContext: NSManagedObjectContext {
        return container.viewContext
    }

    public func newPrivateQueueContext() -> NSManagedObjectContext {
        let context = container.newBackgroundContext()
        return context
    }
...

We grab a private queue context via newPrivateQueueContext(), perform our work and then save. Large data sets result in NSFetchedResultsController hanging.

Apple recommends setting viewContext.automaticallyMergesChangesFromParent = true, and also suggests that saving directly to the persistent store is more efficient than saving up to a middleman (the view context) in a parent-child configuration:

Both contexts are connected to the same persistentStoreCoordinator, which serves as their parent for data merging purposes. This is more efficient than merging between parent and child contexts.

We have actually managed to solve this problem, by removing automaticallyMergesChangesFromParent = true and making the following changes to how our private queue context is configured:

...
    public var mainQueueManagedObjectContext: NSManagedObjectContext {
        return container.viewContext
    }

    public func newPrivateQueueContext() -> NSManagedObjectContext {
        let context = NSManagedObjectContext(concurrencyType: .privateQueueConcurrencyType)
        context.parent = container.viewContext

        NotificationCenter.default.addObserver(self, selector: #selector(handlePrivateQueueContextDidSaveNotification(_:)), name: .NSManagedObjectContextDidSave, object: context)

        return context
    }

    @objc func handlePrivateQueueContextDidSaveNotification(_ note: Notification) {
        container.viewContext.performAndWait {
            try? container.viewContext.save()
        }
    }
...

This, in effect, configures our main and child contexts in a parent-child configuration - which is supposed to be less efficient, according to Apple.

This works! Data is persisted correctly to disk (verified), the data is valid (verified), and no more NSFetchedResultsController hangs!

This, however, raises a few questions:

Why does Apple's recommended way to set up the NSPersistentContainer result in locking up the main queue when processing large data sets? Isn't it supposed to be more efficient? Is there something we're missing?
Has anyone encountered an issue like this, and perhaps solved it in a different way? We can't find much information about setting up a NSPersistentContainer to handle mega data-sets online.
Can you see any issues with the way we've set up our stack, and perhaps suggest improvements to the configuration?
It appears as if saving directly to the persistent store, then viewContext merging the changes in is less efficient than the parent-child configuration? Could someone perhaps shed some light on this?

I should add that we've attempted to make our NSFetchedResultsController more efficient by setting the fetchBatchSize and improve the predicates, to no avail.

if you have find the answer can you please explain how you have overcome this problem — Deepak Singh, Feb 16 '20 at 07:21

NSPersistentContainer & NSFetchedResultsController with large data set

0 Answers0