We recently switched our app over to using NSPersistentContainer
to set up our Core Data stack. The removal of boilerplate such as automatic consuming of save notifications and merge handling was appealing to us, plus it's supposed to be set up to be very efficient.
However, we're facing an issue when importing large data sets. Let me start by telling you that our data model is rather complex - lots of one-to-many relationships and entities.
The Core Data stack used to be set up to use a private queue NSManagedObjectContext
, attached to the NSPersistentStoreCoordinator
, to perform the persistence on a background queue. The main queue context would then be a child of this context; private queue contexts created as children of the main queue context to handle saves. A fairly standard set up before the invention of NSPersistentContainer
.
However, when we started to notice that as our data sets grew larger, profiling the app would show us that Core Data was taking up a lot of CPU time on the main thread. Switching to NSPersistentContainer
seemed to remedy this. A lot less activity on the main thread. We presume it's because there is less traffic going through the main queue (as NSPersistentContainer
's background queues vended by newBackgroundQueue()
are set up to save directly to the store coordinator; they aren't children of the main queue context).
This appeared to be all well and good, until the data set grew. We noticed that when processing around 15,000 records (sometimes with up to 10-15,000 objects that are related to these records), upon saving the background context if an NSFetchedResultsController
was set up to observe these objects, the UI would hang. Badly. For up to 1 minute. Obviously this is not desirable.
Here's how our persistent container is set up:
...
public init(storeURL: URL, modelName: String, configureStoreDescriptionHandler: ((NSPersistentStoreDescription, NSManagedObjectModel) -> ())? = nil) throws {
guard let modelURL = Bundle.main.url(forResource: modelName, withExtension: "momd") else { throw StackError.modelNotFound }
guard let model = NSManagedObjectModel(contentsOf: modelURL) else { throw StackError.modelNotCreated }
let storeDescription = NSPersistentStoreDescription(url: storeURL)
storeDescription.type = NSSQLiteStoreType
configureStoreDescriptionHandler?(storeDescription, model)
storeDescription.shouldMigrateStoreAutomatically = true
storeDescription.shouldInferMappingModelAutomatically = true
storeDescription.shouldAddStoreAsynchronously = false
container = NSPersistentContainer(name: modelName, managedObjectModel: model)
container.persistentStoreDescriptions = [storeDescription]
var outError: StackError?
container.loadPersistentStores { (storeDescription, error) in
if let error = error {
assertionFailure("Unable to load \(storeDescription) because \(error)")
outError = .storeNotMigrated
}
}
if let error = outError {
throw error
}
container.viewContext.automaticallyMergesChangesFromParent = true
}
public var mainQueueManagedObjectContext: NSManagedObjectContext {
return container.viewContext
}
public func newPrivateQueueContext() -> NSManagedObjectContext {
let context = container.newBackgroundContext()
return context
}
...
We grab a private queue context via newPrivateQueueContext()
, perform our work and then save. Large data sets result in NSFetchedResultsController
hanging.
Apple recommends setting viewContext.automaticallyMergesChangesFromParent = true
, and also suggests that saving directly to the persistent store is more efficient than saving up to a middleman (the view context) in a parent-child configuration:
Both contexts are connected to the same persistentStoreCoordinator, which serves as their parent for data merging purposes. This is more efficient than merging between parent and child contexts.
We have actually managed to solve this problem, by removing automaticallyMergesChangesFromParent = true
and making the following changes to how our private queue context is configured:
...
public var mainQueueManagedObjectContext: NSManagedObjectContext {
return container.viewContext
}
public func newPrivateQueueContext() -> NSManagedObjectContext {
let context = NSManagedObjectContext(concurrencyType: .privateQueueConcurrencyType)
context.parent = container.viewContext
NotificationCenter.default.addObserver(self, selector: #selector(handlePrivateQueueContextDidSaveNotification(_:)), name: .NSManagedObjectContextDidSave, object: context)
return context
}
@objc func handlePrivateQueueContextDidSaveNotification(_ note: Notification) {
container.viewContext.performAndWait {
try? container.viewContext.save()
}
}
...
This, in effect, configures our main and child contexts in a parent-child configuration - which is supposed to be less efficient, according to Apple.
This works! Data is persisted correctly to disk (verified), the data is valid (verified), and no more NSFetchedResultsController
hangs!
This, however, raises a few questions:
- Why does Apple's recommended way to set up the
NSPersistentContainer
result in locking up the main queue when processing large data sets? Isn't it supposed to be more efficient? Is there something we're missing? - Has anyone encountered an issue like this, and perhaps solved it in a different way? We can't find much information about setting up a
NSPersistentContainer
to handle mega data-sets online. - Can you see any issues with the way we've set up our stack, and perhaps suggest improvements to the configuration?
- It appears as if saving directly to the persistent store, then
viewContext
merging the changes in is less efficient than the parent-child configuration? Could someone perhaps shed some light on this?
I should add that we've attempted to make our NSFetchedResultsController
more efficient by setting the fetchBatchSize
and improve the predicates, to no avail.