2

Background

Saving a large amount of data at a time is very slow.

Current Setup

In my app there's a private-queue NSManagedObjectContext as the parent that talks to the NSPersistentStoreCoordinator directly to save data. A child main-queue context is consumed by a NSTreeController for the UI(NSOutlineView).

(My goal was to prevent any occurence of the beach ball. Currently I remedy the problem by only saving data when the app goes inactive. But since the data that are planed to be deleted are not deleted yet, they may still come up in a fetch result. That's another problem I'm trying to solve.)

The Problem

The child main-queue context can only wait when fetching when the parent context is busy saving.

Related Problems

I will update this question when I have more findings.

Community
  • 1
  • 1
LShi
  • 1,500
  • 16
  • 29

1 Answers1

3

I'm guessing you're developing for OS X / macOS (NSTreeController & NSOutlineView). I've no experience with macOS - I develop for iOS - so you might need to take that into account when you're reading my response.

I've not yet made the switch to swift - my code is, perhaps obviously, Objective-C...

I'll start with how I prepare the Core Data stack.

I set up two public properties in the header file:

@property (nonatomic, strong) NSManagedObjectContext *mocPrivate;
@property (nonatomic, strong) NSManagedObjectContext *mocMain;

Although this is unnecessary, I also prefer to set up private properties for my Core Data objects, including, for example:

@property (nonatomic, strong) NSPersistentStoreCoordinator *persistentStoreCoordinator;

Once I've pointed to my model URL, established my managed object model NSManagedObjectModel, pointed to my store URL for my NSPersistentStore and established my persistent store coordinator NSPersistentStoreCoordinator (PSC), I set up my two managed object contexts (MOC).

Within the method to "build" my Core Data stack, after I've completed the code per the above paragraph, I then include the following...

if (!self.mocPrivate) {
    self.mocPrivate = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSPrivateQueueConcurrencyType];
    [self.mocPrivate setPersistentStoreCoordinator:self.persistentStoreCoordinator];
} else {
    // report to console the use of existing MOC
}

if (!self.mocMain) {
    self.mocMain = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSMainQueueConcurrencyType];
    [self.mocMain setParentContext:self.mocPrivate];
} else {
    // report to console the use of existing MOC
}

(I usually include a few NSLog lines in this code to report to my console but I've excluded that here to keep the code clean.)

Note two important aspects to this code...

  • set the private queue MOC to interact with the PSC; and
  • set the main queue MOC as the child of the private queue MOC.

Why is this done? First let's highlight a couple of important points:

  • Saves to memory are relatively fast; and
  • Saves to disc are relatively slow.

The private queue is asynchronous to the main queue. The User Interface (UI) operates on the main queue. The private queue operates on a separate thread "in the background" working to maintain context and coordinate data persistence with the PSC, perfectly managed by Core Data and iOS. The main queue operates on the main thread with the UI.

Written a different way...

  • Heavy work completing irregular (managed by Core Data) data persistence to the PSC (saves to disc) is completed in the private queue; and
  • Light work completing regular (managed by developer) data persistence to the MOC (saves to memory) is completed in the main queue.

In theory this should ensure your UI is never blocked.

But there is more to this solution. How we manage the "save" process is important...

I write a public method:

- (void)saveContextAndWait:(BOOL)wait;

I call this method from any class that needs to persist data. The code for this public method:

- (void)saveContextAndWait:(BOOL)wait {
    // 1. First
    if ([self.mocMain hasChanges]) {
    // 2. Second
        [self.mocMain performBlockAndWait:^{
            NSError __autoreleasing *error;
            BOOL success;
            if (!(success = [self.mocMain save:&error])) {
                // error handling
            } else {
                // report success to the console
            }
        }];
    } else {
        NSLog(@"%@ - %@ - CORE DATA - reports no changes to managedObjectContext MAIN_", NSStringFromClass(self.class), NSStringFromSelector(_cmd));
    }

    // 3. Third
    void (^savePrivate) (void) = ^{
        NSError __autoreleasing *error;
        BOOL success;
        if (!(success = [self.mocPrivate save:&error])) {
                // error handling
            } else {
                // report success to the console
        }
    };

    // 4. Fourth
    if ([self.mocPrivate hasChanges]) {
    // 5. Fifth
        if (wait) {
            [self.mocPrivate performBlockAndWait:savePrivate];
        } else {
            [self.mocPrivate performBlock:savePrivate];
        }
    } else {
        NSLog(@"%@ - %@ - CORE DATA - reports no changes to managedObjectContext PRIVATE_", NSStringFromClass(self.class), NSStringFromSelector(_cmd));
    }
}

So I'll work through this to explain what is happening.

I offer the developer the option to save and wait (block), and depending on the developer's use of the method saveContextAndWait:wait, the private queue MOC "saves" using either:

  • the performBlockAndWait method (developer calls method with wait = TRUE or YES); or
  • the performBlock method (developer calls method with wait = FALSE or NO).

First, the method checks whether there are any changes to the main queue MOC. Let's not do any work unless we have to!

Second, the method completes a (synchronous) call to performBlockAndWait on the main queue MOC. This performs the call to save method in a code block and waits for completion before allowing the code to continue. Remember this is for regular "saves" of small data sets. The (asynchronous) option to call performBlock is not required here and in fact will derail the effectiveness of the method, as I experienced when I was learning to implement this in my code (failure to persist data due to the save call on the main queue MOC attempting to complete after completion of the save on the private queue MOC).

Third, we write a little block within a block that contains the code to save the private queue MOC.

Fourth, the method checks whether there are any changes to the private queue MOC. This may be unnecessary but it is harmless to include here.

Fifth, depending on the option the developer chooses to implement (wait = YES or NO) the method calls either performBlockAndWait or performBlock on the block within a block (under third above).

In this last step, regardless of the implementation (wait = YES or NO), the function of persisting data to disc, from the private queue MOC to the PSC, is abstracted to the private queue on an asynchronous thread to the main thread. In theory the "save to disc" via the PSC can take as long as it likes because it has nothing to do with the main thread. AND because the private queue MOC has all the data in memory, the main queue MOC is fully and automatically informed of the changes because it is the child of the private queue MOC.

If you import large volumes of data into app, something I am currently working on implementing, then it makes sense to import this data into the private queue MOC.

The private queue MOC does two things here:

  • It coordinates data persistence (to disc) with the PSC;
  • Because it is the parent of the main queue MOC (in memory), the main queue MOC will be notified of the data changes in the private queue MOC and merges are managed by Core Data;

Finally, I use NSFetchedResultsController (FRC) to manage my data fetches, which are all completed against the main queue MOC. This maintains data hierarchy. As changes are made to the data sets in either context, the FRC updates the view.

This solution is simple! (Once I spent weeks wrangling my head around it and another few weeks refining my code.)

There is no requirement to monitor notifications for merges or other changes to MOC. Core Data and iOS handle everything in the background.

So if this doesn't work for you - let me know - I may have excluded or overlooked something as I wrote this code well over a year ago.

andrewbuilder
  • 3,629
  • 2
  • 24
  • 46
  • Thank Andrew! My app had almost the same setup in its previous version. But due to the large dataset, there's a rolling beach ball every time when I save the main-queue(child) MOC (usually after a deletion of a data hierachy of 10s of thousand of objects). The MOC needs to decide what to delete according to the Cascade delete rule. Even this happens all in memory, it's still relatively slow(5-7 seconds maybe). Secondly, when the background-moc is busy saving, the main-queue moc can't fetch data immediately since the parent is where it gets data from. So I switched to my current setup. – LShi Mar 18 '17 at 07:39
  • The stack you presented really works well for relatively small amount of data, and when the user (in the sense of UI) doesn't initiate fetches very often. (It won't block the UI as long as the user doesn't click anything that initiates a fetch, when the parent context's queue has unfinished tasks.) I will continue to investigate this problem. I may try adding a second PSC for reading only. That may solve the problem of "can't read while writing". – LShi Mar 18 '17 at 08:01
  • Also maybe doing the ftech in an async way, and give the user some visual hint to let them know the UI will update later, is a good solution. But there still will be a beach ball if saving in the main-queue moc – LShi Mar 18 '17 at 08:10
  • 1
    Interesting problem. I'd be concerned about using two PSCs - in that you'll still have record level locks occur in the SQLite store (assuming thats what you're using). Isn't Core Data is optimised for one PSC. But if you're using 2nd PSC to read-only, maybe thats not going to be a problem. What I'd be suggesting is to take a step back and look at what you're attempting to achieve... does you app need to demonstrate awareness of all deleted records on screen at the same time, or can you consider managing the data in batches? Would it help to fault the records you want to delete? – andrewbuilder Mar 18 '17 at 11:07
  • 1
    Regarding the Matt Gallagher article and the comment "Nothing gets close to NSDictionary", have you considered fetching relevant existing data into an `NSDictionary` or `NSArray` and storing this in memory while the app is user facing? You can then manage all data transactions in memory. Perhaps easier to fetch new data and delete older data? You'd lose the ability to use a FRC but maybe that is not a bad thing in your situation? Then once the user demands reduce, you could insert the data set in the `NSDictionary` into the private queue MOC and let Core Data manage persistence? – andrewbuilder Mar 18 '17 at 11:16
  • I do need to take a step back to reconsider the workflow/architecture. Because the data graph may be quite large, I guess it will be hard(spacially) to keep those dictionaries or arrays. I can neither read the whole dataset into memory. If I just read them on user's demand, it still need to fetch from the store at times. Recently I have some web-dev work to do, maybe it's good for me to leave this problem for a little while and do some reading on Core Data. Hope that will help. I will keep update. Thank you my friend! I work in Nanjing and often go back home Shanghai. Let me know if you come! – LShi Mar 18 '17 at 12:58