0

This may have been asked a lot but I'm still lost. I need to parse an XML file that I retrieve from Google Reader's API. Basically, it contains objects such as below :

<object>
    <string name="id">feed/http://developer.apple.com/news/rss/news.rss</string>
    <string name="title">Apple Developer News</string>
    <list name="categories">
        <object>
            <string name="id">user/17999068807557229152/label/Apple</string>
            <string name="label">Apple</string>
        </object>
    </list>
    <string name="sortid">DB67AFC7</string>
    <number name="firstitemmsec">1317836072018</number>
    <string name="htmlUrl">http://developer.apple.com/news/</string>
</object>

I have tried with NSXMLParser and it works but it is really slow. Maybe my code is not the most efficient but still, it can take more than 10 second to parse and save an object into Core Data. I also have taken a look a several other libraries but their use seem a bit complicated and heavy for such a small XML file.

What do you think I should use ?

Thank you.

EDIT

Here the parser code:

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict {

    if([elementName isEqualToString:@"list"] && [[attributeDict objectForKey:@"name"] isEqualToString:@"subscriptions"]){
        subscriptionListFound = YES;
    }

    if(subscriptionListFound){
        if([elementName isEqualToString:@"list"] && [[attributeDict objectForKey:@"name"] isEqualToString:@"categories"]){
            categoryFound = YES;
            currentCategoryId = [[[NSMutableString alloc] init] autorelease];
            currentCategoryLabel = [[[NSMutableString alloc] init] autorelease];
        }
        if([elementName isEqualToString:@"object"] && !subscriptionFound && !categoryFound){
            subscriptionFound = YES;
            currentSubscriptionTitle = [[[NSMutableString alloc] init] autorelease];
            currentSubscriptionId = [[[NSMutableString alloc] init] autorelease];
            currentSubscriptionHtmlURL = [[[NSMutableString alloc] init] autorelease];
        }
        if([elementName isEqualToString:@"string"] && [[attributeDict objectForKey:@"name"] isEqualToString:@"id"]){
            if(categoryFound){
                categoryIdFound = YES; 
            }
            else{
                subscriptionIdFound = YES;
            }
        }
        if([elementName isEqualToString:@"string"] && [[attributeDict objectForKey:@"name"] isEqualToString:@"title"]){
            subscriptionTitleFound = YES;
        }
        if([elementName isEqualToString:@"string"] && [[attributeDict objectForKey:@"name"] isEqualToString:@"label"]){
            categoryLabelFound = YES;
        }
        if([elementName isEqualToString:@"string"] && [[attributeDict objectForKey:@"name"] isEqualToString:@"htmlUrl"]){
            subscriptionHtmlURLFound = YES;
        }
    }
}

- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName {   

    if([elementName isEqualToString:@"list"] && !categoryFound){
        subscriptionListFound = NO;
    }

    if([elementName isEqualToString:@"list"] && categoryFound){
        categoryFound = NO;
    }

    if([elementName isEqualToString:@"object"] && !categoryFound && subscriptionFound){        
        [self saveSubscription];
        [[NSNotificationCenter defaultCenter] postNotificationName:@"currentSubscriptionNotification" object:currentSubscriptionTitle];
        subscriptionFound = NO;
    }

    if([elementName isEqualToString:@"string"]){
        if(subscriptionIdFound == YES) {
            [currentSubscriptionId appendString:self.currentParsedCharacterData];
            subscriptionIdFound = NO;
        }
        if(subscriptionTitleFound == YES) {
            [currentSubscriptionTitle appendString:self.currentParsedCharacterData];
            subscriptionTitleFound = NO;
        }
        if(subscriptionHtmlURLFound == YES) {
            [currentSubscriptionHtmlURL appendString:self.currentParsedCharacterData];
            subscriptionHtmlURLFound = NO;
        }
        if(categoryIdFound == YES) {
            [currentCategoryId appendString:self.currentParsedCharacterData];
            categoryIdFound = NO;
        }
        if(categoryLabelFound == YES) {
            [currentCategoryLabel appendString:self.currentParsedCharacterData];
            categoryLabelFound = NO;
        }
    }

    [self.currentParsedCharacterData setString:@""];
}

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string {
    [self.currentParsedCharacterData appendString:string];
}

Here the code to save by means of CoreData:

- (void) saveSubscription {

    NSFetchRequest *fetchRequest = [[[NSFetchRequest alloc] init] autorelease];
    [fetchRequest setEntity:
     [NSEntityDescription entityForName:@"Group" inManagedObjectContext:context]];
    [fetchRequest setPredicate: [NSPredicate predicateWithFormat: @"(id == %@)",self.currentCategoryId]];
    [fetchRequest setSortDescriptors: [NSArray arrayWithObject:
                                        [[[NSSortDescriptor alloc] initWithKey: @"id"
                                        ascending:YES] autorelease]]];

    NSError *error2 = nil;
    NSArray *foundGroups = [context executeFetchRequest:fetchRequest error:&error2];

    if ([foundGroups count] > 0) {
        self.currentGroupObject = [foundGroups objectAtIndex:0];
    }
    else {
        self.currentGroupObject = [NSEntityDescription insertNewObjectForEntityForName:@"Group" inManagedObjectContext:context];
        [self.currentGroupObject setId:self.currentCategoryId];
        [self.currentGroupObject setLabel:self.currentCategoryLabel];
    }

    fetchRequest = [[[NSFetchRequest alloc] init] autorelease];
    [fetchRequest setEntity:
     [NSEntityDescription entityForName:@"Subscription" inManagedObjectContext:context]];
    [fetchRequest setPredicate: [NSPredicate predicateWithFormat: @"(id == %@)", self.currentSubscriptionId]];
    [fetchRequest setSortDescriptors: [NSArray arrayWithObject:
                                       [[[NSSortDescriptor alloc] initWithKey: @"id"
                                                                    ascending:YES] autorelease]]];

    error2 = nil;
    NSArray *foundSubscriptions = [context executeFetchRequest:fetchRequest error:&error2];

    if ([foundSubscriptions count] > 0) {
        self.currentSubscriptionObject = [foundSubscriptions objectAtIndex:0];
    }
    else {
        self.currentSubscriptionObject = [NSEntityDescription insertNewObjectForEntityForName:@"Subscription" inManagedObjectContext:context];
        [self.currentSubscriptionObject setId:self.currentSubscriptionId];
        [self.currentSubscriptionObject setTitle:self.currentSubscriptionTitle];
        [self.currentSubscriptionObject setHtmlURL:self.currentSubscriptionHtmlURL];
        NSString *faviconURL = [self favIconUrlStringFromURL:self.currentSubscriptionHtmlURL];
        NSString *faviconPath = [self saveFavicon:self.currentSubscriptionTitle url:faviconURL];
        [self.currentSubscriptionObject setFaviconPath:faviconPath];
        [self.currentSubscriptionObject setGroup:self.currentGroupObject];
        [self.currentGroupObject addSubscriptionObject:self.currentSubscriptionObject];
    }

    NSError *error;
    if (![context save:&error]) {
        NSLog(@"Whoops, couldn't save: %@", [error localizedDescription]);
    }
}
Paul Tyng
  • 7,924
  • 1
  • 33
  • 57
Titouan de Bailleul
  • 12,920
  • 11
  • 66
  • 121
  • If you would add your code, we might be able to help you improve it. – vikingosegundo Feb 04 '12 at 10:50
  • I retitled the question, as "best library" questions are basically just opinion, and you are looking for specific perf enhancements in this scenario, whether it involves a new library or not – Paul Tyng Feb 05 '12 at 18:36

4 Answers4

9

Your parsing logic is quite inefficient - you are doing the same test over and over again by saying

if (string and x) do this
if (string and y) do this
if (string and z) do this

Instead of

if (string)
    if (x) do this
    if (y) do this
    if (z) do this

All those unnecessary string comparisons are probably why your parsing is so slow. Same goes for all the object lookups. If you need a value multiple times, get it once and then store it in a variable.

Objective C method calls are relatively slow and can't be optimised away by the compiler, so if the value doesn't change you should call the method once and then store it.

So for example, this:

if([elementName isEqualToString:@"string"] && [[attributeDict objectForKey:@"name"] isEqualToString:@"id"]){
    if(categoryFound){
        categoryIdFound = YES; 
    }
    else{
        subscriptionIdFound = YES;
    }
}
if([elementName isEqualToString:@"string"] && [[attributeDict objectForKey:@"name"] isEqualToString:@"title"]){
    subscriptionTitleFound = YES;
}
if([elementName isEqualToString:@"string"] && [[attributeDict objectForKey:@"name"] isEqualToString:@"label"]){
    categoryLabelFound = YES;
}
if([elementName isEqualToString:@"string"] && [[attributeDict objectForKey:@"name"] isEqualToString:@"htmlUrl"]){
    subscriptionHtmlURLFound = YES;
}

Could be rewritten as this:

NSString *name = [attributeDict objectForKey:@"name"];
if([elementName isEqualToString:@"string"])
{
    if ([name isEqualToString:@"id"])
    {
        if(categoryFound){
            categoryIdFound = YES; 
        }
        else{
            subscriptionIdFound = YES;
        }
    }
    else if ([name isEqualToString:@"title"])
    {
        subscriptionTitleFound = YES;
    }
    else if ([name isEqualToString:@"label"])
    {
        categoryLabelFound = YES;
    }
    else if ([name isEqualToString:@"htmlUrl"])
    {
        subscriptionHtmlURLFound = YES;
    }
}

Which is way more efficient.

Nick Lockwood
  • 40,865
  • 11
  • 112
  • 103
1

I suggest you to use GDataXML. It's quite simple to use and very fast. For further info you can read at how-to-read-and-write-xml-documents-with-gdataxml.

I've already replied to a similar question on how to read attribute with GDataXML in this Stack Overflow topic: get-xml-response-value-with-gdataxml.

Community
  • 1
  • 1
Lorenzo B
  • 33,216
  • 24
  • 116
  • 190
  • 1
    I agree with Flex_Addicted on GDataXML, but also wanted to add a link to this incredibly useful post on the subject of choosing xml parsers on ios http://www.raywenderlich.com/553/how-to-chose-the-best-xml-parser-for-your-iphone-project – shawnwall Feb 04 '12 at 14:57
0

I my opinion, the best library for parsing XML on iOS is TouchXML. It allows you to parse XML using xPaths and has advanced element parsing options. You can also parse XHTML documents with this.

Parsing is very easy:

NSData *xmlData = read your xml file
CXMLDocument *doc = [[CXMLDocument alloc] initWithData:xmlData options:0 error:nil]
NSArray *objects = [doc nodesForXPath:@"//object" error:nil];

for (CXMLElement *object in objects) {
    NSArray *children = [object children];
    for(CXMLElement *child in children) {
        if([[child name] isEqualToString:@"string"]) {
            // you are parsing <string> element.
            // you can obtain element attribute by:
            NSString *name = [[child attributeForName:@"name"] stringValue];
            // you can obtain string between <></> tags via:
            NSString *value = [child stringValue];
        } else if([[child name] isEqualToString:@"list"]) {
            // you are parsing <list> element.
        } else if ... 
    }
}
akashivskyy
  • 44,342
  • 16
  • 106
  • 116
0

After having developed a few apps with similar needs as yours, I would wholeheartedly recommend the AQToolkit

My usual setup for parsing XML is more or less like this:

  • Create a separate queue, using either GCD og NSOperationsQueue
  • Set up a input stream using HTTPMessage and AQGZipInputStream

Example Code:

HTTPMessage *message = [HTTPMessage requestMessageWithMethod:@"GET" url:url version:HTTPVersion1_1];
[message setUseGzipEncoding:YES];       
AQGzipInputStream *inputstream = [[AQGzipInputStream alloc] initWithCompressedStream:         [message inputStream]];
  • Hand the stream to a separate parser delegate, which creates a separate NSManagedObjectContext, and merges changes into main NSManagedObjectContext on save (NSManagedObject is not thread safe!)

Example code for initializing the context, and adding notifications for merging:

-(void)parserDidStartDocument:(AQXMLParser *)parser
{
  self.ctx=[[NSManagedObjectContext alloc] init];
  [self.ctx setMergePolicy: NSMergeByPropertyObjectTrumpMergePolicy];
  [self.ctx setPersistentStoreCoordinator: [Database db].persistentStoreCoordinator];
  NSNotificationCenter *dnc = [NSNotificationCenter defaultCenter];
  [dnc addObserver:self selector:@selector(mergeContextChanges:) name:NSManagedObjectContextDidSaveNotification object:self.ctx];  
  parsedElements = 0;
}

- (void)mergeContextChanges:(NSNotification *)notification{
  SEL selector = @selector(mergeHelper:);
  [self performSelectorOnMainThread:selector withObject:notification waitUntilDone:YES];
}

- (void)mergeHelper:(NSNotification*)saveNotification
{
// Fault in all updated objects
NSArray* updates = [[saveNotification.userInfo objectForKey:@"updated"] allObjects];
for (NSInteger i = [updates count]-1; i >= 0; i--)
{
    [[[Database db].managedObjectContext objectWithID:[[updates objectAtIndex:i] objectID]] willAccessValueForKey:nil];
}

// Merge
[[Database db].managedObjectContext    mergeChangesFromContextDidSaveNotification:saveNotification];
}

In my mind, choosing the right parser is more critical for huge datasets. If your dataset is manageable, then you have a lot to gain from a decent implementation. Using any libxml based parser, and parsing chunks of data as you receive them will give you significant performance increases from parsing data after it is downloaded.

Depending on your datasource, libz might throw Z_BUF_ERROR (at least in the simulator). I've suggested a solution in a pull-request on the AQToolkit, but I'm quite sure there would be even better solutions out there!

Audun Kjelstrup
  • 1,430
  • 8
  • 13