5

I'm parsing a JSON file on an iPad which has about 53 MB. The parsing is working fine, I'm using Yajlparser which is a SAX parser and have set it up like this:

    NSData *data = [NSData dataWithContentsOfFile:path options:NSDataReadingMappedAlways|NSDataReadingUncached error:&parseError];
    YAJLParser *parser = [[YAJLParser alloc] init];
    parser.delegate = self;
    [parser parse:data];

Everything worked fine until now, but the JSON-file became bigger and now I'm suddenly experiencing memory warnings on the iPad 2. It receives 4 Memory Warnings and then just crashes. On the iPad 3 it works flawlessly without any mem warnings.

I have started profiling it with Instruments and found a lot of CFNumber allocations (I have stopped Instruments after a couple of minutes, I had it run before until the crash and the CFNumber thing was at about 60 mb or more).

CFNumber allocations

After opening the CFNumber detail, it showed up a huge list of allocations. One of them showed me the following:

CFNumber alloc 1

and another one here:

CFNumber alloc 2

So what am I doing wrong? And what does that number (e.g. 72.8% in the last image) stand for? I'm using ARC so I'm not doing any Release or Retain or whatever.

Thanks for your help. Cheers

EDIT: I have already asked the question about how to parse such huge files here: iPad - Parsing an extremely huge json - File (between 50 and 100 mb)
So the parsing itself seems to be fine.

Community
  • 1
  • 1
gasparuff
  • 2,295
  • 29
  • 48
  • That number means its 72.8% likely that this part of code is causing the major issue you're facing. (its most of the time pointing to the right direction except a few). Just because you're using ARC it does not mean 100% error free / leak free code. For instance circular references can still cause leaks in your code. Try to run the memory analyser and see if your "virtual memory" is getting too large. Do post your feedback – nsuinteger Aug 27 '13 at 21:25
  • From what I understand CoreFoundation does try to minimize the number of instances of objects such as Numbers and Strings. Is the property "kundennr" defined with as "copy" by any chance? It looks like you might be making copies every time you are assigning that property and to "currentWarengruppeVK". That would negate the built in efficiencies provided by CoreFoundation. – Saltymule Aug 27 '13 at 21:51
  • One potential problem with the YAJL parser is that it passes NSObjects for the primitive JSON values (String, True, False, Number, Null). That is, it internally has to allocate NSObjects for that. Likely, these objects also will be put into an autorelease pool. A better approach would not allocate anything for passing the JSON primitive values to the parser's delegate. Oftentimes (like for creating CD managed objects), this is unnecessary, too. CD managed objects will create copies for their properties anyway. In short: IMO, YAJL seems suboptimal for your problem. – CouchDeveloper Aug 28 '13 at 07:42
  • A minor hint with a potentially tremendous healing effect: don't use a NSNumber for a "Kundennummer". You get into serious trouble, if your Kundennummer" will not fit into an integer, or contains non-numeric characters. NSNumber will not warn you! Better use a string. – CouchDeveloper Aug 28 '13 at 07:52
  • @Dan_Gabicoware, no, kundennr is defined like this `@property (nonatomic, retain) NSNumber * kundennr;`, it's an entity which was created by Xcode (NSManagedObject subclass). – gasparuff Aug 28 '13 at 12:25

2 Answers2

5

See Apple's Core Data documentation on Efficiently Importing Data, particularly "Reducing Peak Memory Footprint".

You will need to make sure you don't have too many new entities in memory at once, which involves saving and resetting your context at regular intervals while you parse the data, as well as using autorelease pools well.

The general sudo code would be something like this:

while (there is new data) {
    @autoreleasepool {
        importAnItem();
        if (we have imported more than 100 items) {
            [context save:...];
            [context reset];
        }
    }
}

So basically, put an autorelease pool around your main loop or parsing code. Count how many NSManagedObject instances you have created, and periodically save and reset the managed object context to flush these out of memory. This should keep your memory footprint down. The number 100 is arbitrary and you might want to experiment with different values.

Because you are saving the context for each batch, you may want to import into a temporary copy of your store in case something goes wrong and leaves you with a partial import. When everything is finished you can overwrite the original store.

Mike Weller
  • 45,401
  • 15
  • 131
  • 151
  • Thanks for your answer. But I'm doing the `[context reset]` after every save. I'm having the main loop inside a `dispatch_async(dispatch_get_global_queue( DISPATCH_QUEUE_PRIORITY_HIGH, 0), ^{}`, but I'm not using the `@autoreleasepool`-thingy, since I thought I won't need to care about that when I'm using ARC. – gasparuff Aug 27 '13 at 13:34
  • 1
    Try wrapping the code in your main loop with an `@autoreleasepool` (inside the loop, not outside). – Mike Weller Aug 27 '13 at 13:36
  • With the `@autoreleasepool` inside the GCD? – gasparuff Aug 27 '13 at 13:42
  • 1
    Your `[NSNumber numberWithInteger:...]` line seems to be causing the memory spike, so it is probably adding many numbers to the autorelease pool which isn't getting drained until the GCD block finishes. You therefore need to put your own autorelease pool around the code in your main loop body to keep this pool count down. – Mike Weller Aug 27 '13 at 13:45
  • 1
    @gasparuff: "but I'm not using the @autoreleasepool-thingy, since I thought I won't need to care about that when I'm using ARC." ARC does not remove the need to manage the lifecycle of autorelease pools. It just inserts calls to `-autorelease` for you when they would be required. – Rob Napier Aug 27 '13 at 14:00
  • @gasparuff As Rob Napeir puts it, it is better to use autorelease pool inside the GCD block http://stackoverflow.com/questions/4141123/do-you-need-to-create-an-nsautoreleasepool-within-a-block-in-gcd – RK- Aug 27 '13 at 17:21
  • I have put the autorelease pool inside the GCD block, but for some reason it makes no difference :-(. Do I also have to put the autorelease into the parser delegate methods?? – gasparuff Aug 28 '13 at 09:11
  • Ok, now after I added the `@autoreleasepool` also into the YajlParser delegate methods it worked. But there's another problem now - I'm holding an array of objects (`Warengruppe`) of which some are getting added to the `Kunde` object (1:n). I'm using that array to avoid fetching it back from CoreData each time. But now after I'm doing the `[context reset]`, it crashes and tells me that it can't do that because these objects are in a different context :-(. Now I'm refetching them by their `managedObjectId`, but this made the parsing task take 3 times as long as usual :-( – gasparuff Aug 28 '13 at 12:31
  • If there are objects you don't want to fault out, then avoid using `reset`. You might not need it at all. Or you can fault out individual objects you don't need with `refreshObject:mergeChanges:`. – Mike Weller Aug 28 '13 at 13:32
1

Try to use [self.managedObjectContext refreshObject:obj refreshChanges:NO] after certain amount of insert operations. This will turn NSManagedObjects into faults and free up some memory.

Apple Docs on provided methods

  • I tried this, unfortunately this didn't really help. And it's `[self.managedObjectContext refreshObject:obj mergeChanges:NO]` :-) – gasparuff Aug 29 '13 at 08:51