My goal is to parse a large XML file (20 GB) with Swift. There are some performance issues with NSXMLParser and bridging to Swift objects, so I'm looking at multi-threading. Specifically the following division:
- Main thread - parses data
- Worker thread - casts ObjC types into Swift types and sends to 1. The casting of ObjC NSDictionary to [String: String] is the largest bottleneck. This is also the main reason for separating onto multiple threads.
- Worker thread - parses XML into ObjC types - and sends to 2. NSXMLParser is a push-parser, once it starts parsing, you cannot pause it.
The data should be parsed sequentially, so the input ordering should be maintained. My idea is to run an NSRunLoop on both 1 and 2, allowing parallel processing without blocking. According to Apple's documentation, communication between the threads can be achieved by calling performSelector:onThread:withObject:waitUntilDone:
. However this symbol is not available in Swift.
I don't think that GCD would fit as a solution. Both worker threads should be long-running processes with new work coming in at random intervals.
How can one achieve the above (e.g. NSRunLoops on multiple threads) using Swift?