7

I am trying to build a data structure in Swift that maps an Integer to an array of objects (a dictionary with int as key and array as value). The objects are extremely small, and they simply wrap a UIColor and an Int. I have two implementations one that uses a Swift array as the Dictionary's value type, while the other uses a NSMutableArray as the value type. My objective-C code performs extremely fast, but my Swift code is running egregiously slow. Ideally, I would not like to use an NSMutableArray, and would like to keep it as a Swift array. Reason for this is I am writing an algorithm and performance matters, I have noticed some overhead with objC_msgSend. Can anyone help me optimize my Swift code? Am I doing something wrong or is this just a byproduct of swift treating array's as value types? If it is, I would like to understand why the value type performs so slow in this case, what my options are, and how can this scenario can scale going forward? Below I have posted a code snippet and the resulting benchmarks:

Swift Array Code:

let numColors = colorCount(filter: filter, colorInfoCount: colorInfo.count)
var colorCountsArray: [Int] = [Int]()
var countToColorMap: [Int:[CountedColor]] = [Int:[CountedColor]](minimumCapacity: capacity)
var topColors = [CountedColor]()

var startTime = CACurrentMediaTime()
for (color, colorCount) in colorInfo {
    colorCountsArray.append(colorCount)
    if countToColorMap[colorCount] != nil {
        countToColorMap[colorCount]?.append(CountedColor(color: color, colorCount: colorCount))
    } else {
        countToColorMap[colorCount] = [CountedColor(color: color, colorCount: colorCount)]
    }
}
var endTime = CACurrentMediaTime()
print("Time after mapping: \(endTime - startTime)")

Swift Performance:

Time after mapping: 45.0881789259997

NSMutableArray code:

let numColors = colorCount(filter: filter, colorInfoCount: colorInfo.count)
var colorCountsArray: [Int] = [Int]()
var countToColorMap: [Int:NSMutableArray] = [Int:NSMutableArray](minimumCapacity: capacity)
var topColors = [CountedColor]()


var startTime = CACurrentMediaTime()
for (color, colorCount) in colorInfo {
    colorCountsArray.append(colorCount)
    if countToColorMap[colorCount] != nil {
        countToColorMap[colorCount]?.add(CountedColor(color: color, colorCount: colorCount))
    } else {
        countToColorMap[colorCount] = NSMutableArray(object: CountedColor(color: color, colorCount: colorCount))
    }
}
var endTime = CACurrentMediaTime()
print("Time after mapping: \(endTime - startTime)")

NSMutableArray Performance:

Time after mapping: 0.367132211999888

The colorInfo object is a dictionary mapping UIColor objects to an Integer value representing a count. The code essentially reverse maps this, mapping an integer to a UIColor array (its an array because multiple Colors can have the same count). The colorInfo has 60,000 UIColor, Int key value pairs inside of it.

mfaani
  • 33,269
  • 19
  • 164
  • 293
AyBayBay
  • 1,726
  • 4
  • 18
  • 37
  • Have you run it through a time profiler? Might reveal something. – Andreas Dec 11 '16 at 21:37
  • Can you post an example project that runs (and a way to generate data)? I've been experimenting with this, and my relatively naive implementation is dramatically faster (a couple of orders of magnitude faster than your NSMutableArray version), but I don't know where your real bottleneck is and if I'm actually doing the same thing or not. You're making a surprising number of objects here. Why is this `[Int: [CountedColor]]` rather than jus `[Int: UIColor]`? Seems a big duplication of time and space. Is `CountedColor` a class or a struct? (I really don't get the `CountedColor` thing :D) – Rob Napier Dec 13 '16 at 00:15
  • Or maybe provide example data so we can run the code without contriving data? – Brandon Deo Dec 14 '16 at 14:40
  • 1
    @RobNapier sorry it took a while but clone the github project here: https://github.com/klayThompson91/ColorProcessorExperiments I just hacked something up and pushed it up quick so the repo structure is kinda weird. But once you clone go into the ColorExtractor folder and open the XCode project. In there you can find the code in question inside the file "ColorProcessor.swift"... Look for the code comments "//The dictionary with a value array" Also, run the "testAverageCasePerformance_mostFrequentColors()" test inside of GeneralPerformanceTests to reproduce my benchmark result data. – AyBayBay Dec 14 '16 at 23:04
  • The same comment above goes to you @bdeo I just cant tag two users in a response on here. – AyBayBay Dec 14 '16 at 23:04

2 Answers2

16

Copy on write is a tricky thing, and you need to think carefully about how many things are sharing a structure that you're trying to modify. The culprit is here.

countToColorMap[colorCount]?.append(CountedColor(color: color as! UIColor, colorCount: colorCount))

This is generating a temporary value that is modified and put back into the dictionary. Since two "things" are looking at the same underlying data structure (the dictionary, and append), it forces a copy-on-write.

The secret to fixing this is to make sure that there's only one copy when you modify it. How? Take it out of the dictionary. Replace this:

if countToColorMap[colorCount] != nil {
    countToColorMap[colorCount]?.append(CountedColor(color: color as! UIColor, colorCount: colorCount))
} else {
    countToColorMap[colorCount] = [CountedColor(color: color as! UIColor, colorCount: colorCount)]
}

which has a runtime of:

Elapsed Time: 74.2517465990022
53217

with this:

var countForColor = countToColorMap.removeValue(forKey: colorCount) ?? []
countForColor.append(CountedColor(color: color as! UIColor, colorCount: colorCount))
countToColorMap[colorCount] = countForColor

which has a runtime of:

Elapsed Time: 0.370953808000195
53217
Rob Napier
  • 286,113
  • 34
  • 456
  • 610
  • Thank you very much for taking the time and the in-depth response. This is very interesting and represents a shift in the way I will have to think about implementing data structures in Swift. Strange why they made Dictionaries and arrays value types in general in my opinion, thanks again. – AyBayBay Dec 15 '16 at 03:54
  • My gut reaction is: this sucks. Is there a good (deep?) reason for why it has to be done this way? – Raphael Jan 10 '17 at 17:13
  • 2
    It's the result of many other choices about how Dictionary is implemented, combined with current limitations in the optimizer. You should expect common uses to improve over time, but you will likely always have to understand how copy on write works in large mutable data structures (similar things are true in most languages when performance is important; you usually have to understand how it works underneath). There's no "type theory" or "deep CS" reason for this behavior. It's just how Swift currently works today. – Rob Napier Jan 10 '17 at 17:19
  • 6
    In Swift 4.1 (available with Xcode 9.3 beta), you can now mutate through `Dictionary`'s `subscript(_:default:)` without invoking a copy of the array value in the dictionary :) e.g `countToColorMap[colorCount, default: []].append(CountedColor(color: color as! UIColor, colorCount: colorCount))`. This is because [the subscript now uses an addressor](https://github.com/apple/swift/pull/12752), so can return a pointer to the value in the dictionary's buffer. – Hamish Jan 25 '18 at 16:05
2

I had some work arounds until swift 4.2 came along

var countToColorMap = [Int: [CountedColor]]()

for (color, colorCount) in colorInfo {
    countToColorMap[colorCount, default: [CountedColor]()].append(CountedColor(color: color as! UIColor, colorCount: colorCount))
}

it is fast and readable

Klajd Deda
  • 355
  • 2
  • 7