I see quite a huge discrepancy between the time it takes to process (read, parse) csv file inside a unit test vs exactly the same code reading and parsing the same file from the same location in either simulator or running on a device. I see around x8 time difference. Running on the main thread.
Edit: The unit test is much faster. Logging times around method call, in both places
More findings: what I actually found and perhaps this is important is that the function being called actually creates a number of threads and then waits for all of them to complete It splits an array of 180000 rows into chunks and processes each chunk asynchronously. (Using DispatchQueue.global().async and DispatchGroup).
Depending on the number of threads the performance while calling in the app degrades, while in a unit test is fairly similar.
func processCSV(fileName: String, columnToGet:[String]?, block: (([String:String]) ->())? = nil) throws -> ([[String:String]]) {
var userReport = [[String:String]?]()
var wmsEntries = 0
do {
let data = try String(contentsOfFile: fileName, encoding: .utf8)
var myStrings = data.components(separatedBy: .newlines)
let headerRow = myStrings.first?.components(separatedBy: ",")
let headerCount = headerRow?.count ?? 0
var headerColumnMap = [String:Int]()
try columnToGet?.forEach({ column in
guard let index = headerRow?.firstIndex(where: {$0.compare(column, options: .caseInsensitive) == .orderedSame}) else {
throw NSError(domain: "Unexpected or invalid header in csv file.", code: NSFileReadCorruptFileError, userInfo:nil )
}
headerColumnMap[column] = index
})
myStrings = Array(myStrings.dropFirst()).filter({!$0.isEmpty})
wmsEntries = myStrings.count
userReport = [[String:String]?](repeating: nil, count: wmsEntries)
let dispatchGroup = DispatchGroup()
func insert(_ record:[Substring], at:Int) {
var entry = [String:String]()
headerColumnMap.forEach({ key, value in
entry[key] = record[value].trimmingCharacters(in: .whitespacesAndNewlines)
})
DispatchQueue.global().async {
block?(entry)
}
userReport[at] = entry
}
let chunkSize = max(1000, myStrings.count / 9)
for (chunkIndex, chunk) in myStrings.chunked(into: chunkSize).enumerated() {
dispatchGroup.enter()
DispatchQueue.global().async {
for (counter, str) in chunk.enumerated() {
let data = self.parse(line: str)
let insertIndex = chunkIndex * chunkSize + counter
guard data.count == headerCount, data.count > columnToGet?.count ?? 0 else {
DDLogError("Error in file, mismatched number of values, on line \(myStrings[chunkIndex * chunkSize + counter])")
continue
}
insert(data, at: insertIndex)
}
dispatchGroup.leave()
}
}
dispatchGroup.wait()
} catch {
print(error)
}
let filtered = userReport.filter({$0 != nil}) as! [[String:String]]
self.numberLinesWithError = wmsEntries - filtered.count
return filtered
}