3

I've got a JSON object containing 200,000 items. I need to iterate through these objects, and determine if they exist or not and perform the relevant action (insert / update / delete). The shell for this is shown below. Granted, it's not actually saving anything yet. It was more to see how long this way would take. This action takes about 8 minutes to process on an iPhone 4, which seems insane, considering there isn't even any changes occurring yet.

Is there a more efficient way to be handling this?

Any advice or pointers would be greatly appreciated.

- (void) progressiveInsert
{
    prodAdd = 0;
    prodUpdate = 0;
    prodDelete = 0;

    dispatch_queue_t backgroundDispatchQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0);

    dispatch_async(backgroundDispatchQueue,
                   ^{
                       _productDBCount = 0;

                       NSLog(@"Background Queue");
                       NSLog(@"Number of products in jsonArray: %lu", (unsigned long)[_products count]);

                       NSManagedObjectContext *backgroundThreadContext = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSConfinementConcurrencyType];
                       [backgroundThreadContext setPersistentStoreCoordinator:_persistentStoreCoordinator];
                       [backgroundThreadContext setUndoManager:nil];

                       [fetchRequest setPredicate:predicate];
                       [fetchRequest setEntity:[NSEntityDescription entityForName:@"Products" inManagedObjectContext:_managedObjectContext]];
                       [fetchRequest setIncludesSubentities:NO]; //Omit subentities. Default is YES (i.e. include subentities)
                       [fetchRequest setFetchLimit:1];

                       [_products enumerateObjectsUsingBlock:^(id product, NSUInteger idx, BOOL *stop) {

                           predicate = [NSPredicate predicateWithFormat:@"code == %@", [product valueForKey:@"product_code"]];
                           [fetchRequest setPredicate:predicate];

                           NSError *err;
                           NSArray *fetchedObjects = [_managedObjectContext executeFetchRequest:fetchRequest error:&err];

                           if (fetchedObjects == nil) {

                               if ([[product valueForKey:@"delete"] isEqualToNumber:[NSNumber numberWithBool:TRUE]]){
                                   prodDelete += 1;
                               } else {
                                   prodAdd += 1;
                               }

                           } else {

                               if ([[product valueForKey:@"delete"] isEqualToNumber:[NSNumber numberWithBool:TRUE]]){
                                   prodDelete += 1;
                               } else {
                                   prodUpdate += 1;
                               }

                           }

                           dispatch_sync(dispatch_get_main_queue(), ^
                                         {

                                             self.productDBCount += 1;
                                             float progress = ((float)self.productDBCount / (float)self.totalCount);
                                             _downloadProgress.progress = progress;

                                             if (_productDBCount == _totalCount){
                                                 NSLog(@"Finished processing");
                                                 _endProcessing = [NSDate date];
                                                 [_btn.titleLabel setText:@"Finish"];
                                                 NSLog(@"Processing time: %f", [_endProcessing timeIntervalSinceDate:_startProcessing]);
                                                 NSLog(@"Update: %i // Add: %i // Delete: %i", prodUpdate, prodAdd, prodDelete);
                                                 [self completeUpdateProcess];

                                             }

                                         });


                       }];


                   });
}
Luke Smith
  • 682
  • 6
  • 18
  • Swift examples for [batch insert](http://stackoverflow.com/a/32034101/3681880) and [batch delete](http://stackoverflow.com/a/32031690/3681880) – Suragch Aug 16 '15 at 11:07

3 Answers3

6

Have a look at Implementing Find-or-Create Efficiently in the "Core Data Programming Guide".

(Update: This chapter does not exist anymore in the current Core Data Programming Guide. An archived version can be found at http://web.archive.org/web/20150908024050/https://developer.apple.com/library/mac/documentation/Cocoa/Conceptual/CoreData/Articles/cdImporting.html.)

One of the key ideas is not to execute one fetch request per product, but execute a "bulk fetch" with a predicate like

[NSPredicate predicateWithFormat:@"code IN %@", productCodes]

where productCodes is an array of "many" product codes from your JSON data. Of course you have to find the optimal "batch size".

Martin R
  • 529,903
  • 94
  • 1,240
  • 1,382
  • Ah, ok. That makes sense. Would that be something I should do within the existing iteration of '_products', or in a different way? For example, adding the product code to an array. Then, when the array count reaches X (100), it performs the batch fetch and subsequently the CoreData actions (insert/update/delete). – Luke Smith Nov 27 '13 at 18:11
  • @LukeSmith: I don't think there is a standard way to do this. What you described is one possible solution. – Martin R Nov 27 '13 at 18:31
  • @Martin R, the link is broken – Rafał Sroka Feb 17 '16 at 14:48
  • @Bearwithme: The Core Data Programming Guide is now here: https://developer.apple.com/library/mac/documentation/Cocoa/Conceptual/CoreData/index.html. It has been substantially rewritten (see "Document Revision History"). Unfortunately, various parts have been *removed*, and "Implementing Find-or-Create Efficiently" seems to be one of these. – Martin R Feb 17 '16 at 15:08
1

With that many objects, I think you need to start being very clever about your data and system and to look for other ways to trim your items prior to fetching 200K JSON objects. You say your using Core Data and are on an iPhone, but you don't specify if this is a client/server application (hitting a web server from the phone). I will try to keep my suggestions general.

Really, you should think outside of your current JSON and more about other data/meta-data that can provides hints about what you really need to fetch prior to merge/update. It sounds like you're synchronizing two databases (phone & remote) and using JSON as your means of transfer.

  1. Can you timestamp your data? If you know the last time you updated your phone DB, you need only pull the data changed after that time.
  2. Can you send your data in sections/partitions? Groupings of 1000-10000 might be much more manageable.
  3. Can you partition your data into sections more or less relevant to the user/app? In this way, items that the user touches first are updated first.
  4. If your data is geographic, can you send data close to region of interest first?
  5. If your data is products, can you send data that the user has looked at more recently first?
  6. If your data is hierarchical, can you mark root nodes as changed (or again timestamp) and only update sub-trees that have changed?

I would be hesitant in any system, whether networked or even local DB, to attempt to merge updates from a 200K list of items unless it were a very simple list (like a numeric merge sort). It's a tremendous waste of time and network resources, and it won't make your customers very happy.

Andrew Philips
  • 1,950
  • 18
  • 23
  • The intention is for the bulk of these records to be included in a pre-populated database. However, as these records can require updating at a rate of approximately 1000/day, if someone didn't update more regularly than once a week, it's going to take a long time. Ideally, no-one should ever have to update this many items at once, it's just a precaution. Subsequent updates are requested with the ID of the last update received (serves the same purpose as your timestamp suggestion), so that only relevant updates are received. – Luke Smith Nov 28 '13 at 09:15
  • I have two different update paths. One for when there is no records in the CD Entity, and it just does a bulk import of the 'current' data (not updates, the latest complete version of the dataset). This takes about 4 minutes on an iPhone 4. The other, is for when there is existing records, and it needs to do an INSERT/UPDATE/DELETE iteration (as described in this post). – Luke Smith Nov 28 '13 at 09:19
  • Luke, even if you find a way to speed up your CD changes 10x (which I'm sure you'd love for smaller updates), you're still looking at 30-60sec for 200K. Even at 1000/day, that's 7000/wk or 10 secs w/ no speed up. I keep wondering about your system design. What's your use case? Is the app often used when there's a network available or mostly disconnected and user touches a button to update the data? Is it automatic? Do you really require every value instantly updated on the phone? Google can autocomplete search terms over the web, what's your burning need to have everything present immediately? – Andrew Philips Nov 29 '13 at 20:34
  • I can't give any specifics. Essentially it's an information catalogue for products, which will commonly be used when there is no network connectivity available. We've toyed with the idea of doing a live lookup, as and when needed, but it simply wouldn't be viable at the moment. The updates would be user triggered, most likely with some form of configurable notification frequency to ensure it's done regularly. The app is essentially useless without the data, and also presents a legal concern if certain information is missing. – Luke Smith Dec 02 '13 at 10:21
  • OK, I think we're at the limit for how I can help you - sorry. I've down some work with CD, but you've got a challenging & constrained problem and will have to resort to bolt tightening to squeeze as much performance out of this as you can. You might consider switching away from CD to an alt. DB for more performance. That has its own costs. Here are two great [blog](http://sealedabstract.com/code/you-should-use-core-data) [posts](http://inessential.com/2010/02/26/on_switching_away_from_core_data) that discuss why you should use CD, and when it might be appropriate to shift. – Andrew Philips Dec 02 '13 at 22:33
0

Don't work on individual items, batch them. Currently you make lots of fetch requests to the context and these take time (use the Core Data Instruments tool to take a look). If you set the batch size for your processing to 100 initially, then fetch that group of ids and then locally check for existence in the fetch results array.

Wain
  • 118,658
  • 15
  • 128
  • 151