1

Comparing the performance of using NSXMLParser within objective C vs Swift, there's a large performance discrepancy. Only registering didStartElement, didEndElement and foundCharacters has a performance of ~17 MB/s in objective C, but a low ~1.4 MB/s in Swift (when casting to String, see below). The code is run in Release (optimized) mode.

Objective C:

#import <Foundation/Foundation.h>

@interface MyDelegate: NSObject <NSXMLParserDelegate> {
    @public
    int didStartElement;
    int didEndElement;
    int foundCharacters;
}
@end

@implementation MyDelegate
-(MyDelegate *)init {
    didStartElement = 0;
    didEndElement = 0;
    foundCharacters = 0;
    return self;
}

-(void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict {
    didStartElement += 1;
}

-(void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName {
    didEndElement += 1;
}

-(void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string {
    foundCharacters += 1;
}
@end

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        NSURL *input = [NSURL fileURLWithPath: [[NSProcessInfo processInfo] arguments][1]];

        NSError *error;

        if (![input checkResourceIsReachableAndReturnError:&error]) {
            NSLog(error.description);
            abort();
        }

        NSXMLParser *parser = [[NSXMLParser alloc] initWithContentsOfURL:input];
        MyDelegate *delegate = [[MyDelegate alloc] init];
        parser.delegate = delegate;

        NSDate *start = [NSDate new];
        if (![parser parse]) {
            NSLog(parser.parserError.description);
        }
        NSDate *end = [NSDate new];

        NSLog(@"Done. #didStartElement: %d, #didEndElement: %d, #foundCharacters: %d", delegate->didStartElement, delegate->didEndElement, delegate->foundCharacters);

        NSDictionary *attrs = [[NSFileManager defaultManager] attributesOfItemAtPath:input.path error:&error];

        // Determine MB/s
        if (error != nil) {
            NSLog(@"%@", error);
            abort();
        }

        double throughput = ((NSNumber *)[attrs valueForKey:NSFileSize]).doubleValue / [end timeIntervalSinceDate:start] / 1e6;
        NSLog(@"Throughput %f MB/s", throughput);
    }
    return 0;
}

Swift:

import Foundation

var input = NSURL(fileURLWithPath: Process.arguments[1])!
var error: NSError?
if !input.checkResourceIsReachableAndReturnError(&error) {
    println(error)
    abort()
}

class MyDelegate: NSObject, NSXMLParserDelegate {
    var didStartElement = 0
    var didEndElement = 0
    var foundCharacters = 0

    func parser(parser: NSXMLParser, didStartElement elementName: String, namespaceURI: String?, qualifiedName qName: String?, attributes attributeDict: [NSObject : AnyObject]) {
        didStartElement += 1
    }

    func parser(parser: NSXMLParser, didEndElement elementName: String, namespaceURI: String?, qualifiedName qName: String?) {
        didEndElement += 1
    }

    func parser(parser: NSXMLParser, foundCharacters string: String) {
        foundCharacters += 1
    }
}


var parser = NSXMLParser(contentsOfURL: input)!
println(input)
var delegate = MyDelegate()
parser.delegate = delegate

var start = NSDate()
parser.parse()
var end = NSDate()

println("Done. #didStartElement: \(delegate.didStartElement), #didEndElement \(delegate.didEndElement), #foundCharacters \(delegate.foundCharacters)")

// Determine MB/s
var attrs = NSFileManager.defaultManager().attributesOfItemAtPath(input.path!, error: &error)
if error != nil {
    println(error!)
    abort()
}
var throughput = Double(attrs![NSFileSize]! as Int) / end.timeIntervalSinceDate(start) / 1e6
println("Throughput \(throughput) MB/s")

A lot of performance is lost when casting; see the difference for the type definitions for the attributes argument in didStartElement:

NSDictionary: 19 MB/s
[NSObject: AnyObject]: 8.5 MB/s
[String: String]: 1.4 MB/s

Using the Counter Instrument, apparently 36% of the time is being spent on converting the dictionary to Swift (using [NSObject: AnyObject]): Swift NSXMLParser Counter Instrument Results

As the attributes of the nodes are relevant for further processing, casting them to Swift String can't be avoided. How to still get a decent processing performance in Swift?

Update

When using libxml2's sax parser directly in C, the performance is around 110 MB/s. So there really is some performance issue here.

Bouke
  • 11,768
  • 7
  • 68
  • 102

2 Answers2

1

I would suggest to use a SAX style parser with a C API as the underlying XML parser (libxml2 for instance). Creating NSDictionarys and especially NSStrings is unduly expensive (unnecessary as well IMHO). So, we might safe at least these costs when creating directly Swift containers and Swift Strings from the data structures obtained from the XML parser.

I do not know, however, how expensive it is to create Swift Strings and Swift Dictionaries. Swift and its library is still in its infancy.

Edit

See also How to use the CoreAudio API in Swift,

and Function callback from C to Swift.

Community
  • 1
  • 1
CouchDeveloper
  • 18,174
  • 3
  • 45
  • 67
  • Using libxml's SAX parser requires defining callbacks. Currently it is not possible to create C Function Pointer from Swift functions. So is this even possible, bypassing Objective C and bridging? – Bouke Oct 19 '14 at 15:33
  • I fear, passing a pointer to a Swift function which gets called in C code is not yet supported. This problem needs to be investigated. There's an interesting observation: http://stackoverflow.com/questions/24107099/function-callback-from-c-to-swift. – CouchDeveloper Oct 19 '14 at 15:43
  • Thanks! While it is currently not possible to implement libxml's sax parser directly from Swift, it is possible to use an ObjC wrapper for it. By passing the character pointers to Swift, there's not much bridging overhead from ObjC to Swift. – Bouke Nov 01 '14 at 09:44
1

You can work around the conversion overhead for now by declaring the dictionary arguments as NSDictionary instead of a Swift dictionary.

Catfish_Man
  • 41,261
  • 11
  • 67
  • 84
  • That would only defer the casting to somewhere else, where the dictionary values are being used. They need to be cast to String somewhere. – Bouke Oct 19 '14 at 16:07
  • It would avoid the dictionary conversion, not the string conversion. – Catfish_Man Oct 19 '14 at 16:47