0

I see quite a huge discrepancy between the time it takes to process (read, parse) csv file inside a unit test vs exactly the same code reading and parsing the same file from the same location in either simulator or running on a device. I see around x8 time difference. Running on the main thread.

Edit: The unit test is much faster. Logging times around method call, in both places

More findings: what I actually found and perhaps this is important is that the function being called actually creates a number of threads and then waits for all of them to complete It splits an array of 180000 rows into chunks and processes each chunk asynchronously. (Using DispatchQueue.global().async and DispatchGroup).

Depending on the number of threads the performance while calling in the app degrades, while in a unit test is fairly similar.

    func processCSV(fileName: String, columnToGet:[String]?, block: (([String:String]) ->())? = nil) throws -> ([[String:String]]) {
    var userReport = [[String:String]?]()
    var wmsEntries = 0
    do {
        let data = try String(contentsOfFile: fileName, encoding: .utf8)
        var myStrings = data.components(separatedBy: .newlines)
        let headerRow = myStrings.first?.components(separatedBy: ",")
        let headerCount = headerRow?.count ?? 0
        var headerColumnMap = [String:Int]()
        try columnToGet?.forEach({ column in
            guard let index = headerRow?.firstIndex(where: {$0.compare(column, options: .caseInsensitive) == .orderedSame}) else {
                  throw NSError(domain: "Unexpected or invalid header in csv file.", code: NSFileReadCorruptFileError, userInfo:nil )
              }
            headerColumnMap[column] = index
        })
        myStrings = Array(myStrings.dropFirst()).filter({!$0.isEmpty})
        wmsEntries = myStrings.count
        userReport = [[String:String]?](repeating: nil, count: wmsEntries)
        let dispatchGroup = DispatchGroup()

        func insert(_ record:[Substring], at:Int) {
            var entry = [String:String]()
            headerColumnMap.forEach({ key, value in
                entry[key] = record[value].trimmingCharacters(in: .whitespacesAndNewlines)
            })
            DispatchQueue.global().async {
                block?(entry)
            }
            userReport[at] = entry
        }
        
        let chunkSize = max(1000, myStrings.count / 9)
        for (chunkIndex, chunk) in myStrings.chunked(into: chunkSize).enumerated() {
            dispatchGroup.enter()
            DispatchQueue.global().async {
                for (counter, str) in chunk.enumerated() {
                    let data = self.parse(line: str)
                    let insertIndex = chunkIndex * chunkSize + counter
                    guard data.count == headerCount, data.count > columnToGet?.count ?? 0 else {
                        DDLogError("Error in file, mismatched number of values, on line \(myStrings[chunkIndex * chunkSize + counter])")
                        continue
                    }
                    insert(data, at: insertIndex)
                }
                dispatchGroup.leave()
            }
        }
        dispatchGroup.wait()
    } catch {
        print(error)
    }
    let filtered = userReport.filter({$0 != nil}) as! [[String:String]]
    self.numberLinesWithError = wmsEntries - filtered.count
    return filtered
}
kos
  • 1,357
  • 9
  • 21
  • Which is faster? How do you know? Have you instrumented the code? Have you used the Time Profiler? – matt Aug 02 '22 at 20:23
  • Tests default to building a release (optimized) build. Are you doing debug (unoptimized) build when running of the simulator/device or a release (optimized) build? Debug builds can, depending upon the code, be much slower. – Rob Aug 02 '22 at 21:13
  • @Rob fair enough. Was thinking it could be something more generic. – kos Aug 03 '22 at 01:39
  • Were your simulator/device runs performed in using a release build or the default debug build? – Rob Aug 03 '22 at 04:09
  • Unrelated to the question at hand (why tests are faster than app), you should note that if you have more than 64 blocks that you are dispatching to the global queue, that would be very bad (it’s called “thread explosion” and can exhaust the GCD worker thread pool and cause all sorts of unexpected problems). Also make sure you're submitting enough work to each thread to offset the overhead of multithreading. See https://stackoverflow.com/a/39949292/1271826 or https://stackoverflow.com/a/59855072/1271826 or https://stackoverflow.com/a/58513549/1271826 or ... – Rob Aug 03 '22 at 05:03
  • Thanks, @Rob that's what my conclusion was it was silly of course to even attempt this what threw me off was timing changes in unit tests for the worst when decreasing the number of blocks. What you say and what I come up with since the question time seem to prove that. The question though is how to correlate unit tests when it is easy to get timing vs actual in the app workflow? – kos Aug 04 '22 at 12:06

0 Answers0