20

I have been picking and probing at Swift standard libraries sort() function for its Array type. To my surprise I have noticed it performs poorly on already-sorted data.

Sorting an array of Int which is shuffled seems to be 5x faster than sorting that very same array when it is already sorted. Sorting an array of shuffled objects is about 4x faster than sorting the very same one already in sorted order (sorting object array vs Int array use different algorithms I am sure so I sorted both to eliminate bias).

These are the results:

Shuffled Int array sort time: 1.3961209654808
Shuffled ColorObject array sort time: 3.14633798599243
NOnshuffled Int array sort time: 7.34714204072952
NOnshuffled ColorObject array sort time: 10.9310839772224

For reference below is my code:

class ElapsedTimer {

    let startTime: CFAbsoluteTime
    var endTime: CFAbsoluteTime?

    init() {
        startTime = CFAbsoluteTimeGetCurrent()
    }

    func stop() -> CFAbsoluteTime {
        endTime = CFAbsoluteTimeGetCurrent()
        return duration!
    }

    var duration: CFAbsoluteTime? {
        if let endTime = endTime {
            return endTime - startTime
        } else {
            return nil
        }
    }
}

public class CountedColor {
    public private(set) var count: Int
    public private(set) var color: UIColor

    public init(color: UIColor, colorCount: Int) {
        self.count = colorCount
        self.color = color
    }
}

var distributedIntArray = [Int]()

for value in 1..<1000000 {
    distributedIntArray.append(value)
}

var distributedCountedColorArray = distributedIntArray.map{ CountedColor(color: UIColor.white, colorCount: $0) }

distributedCountedColorArray.shuffle()
distributedIntArray.shuffle()

var timer = ElapsedTimer()
distributedIntArray.sort()
print("Shuffled Int array sort time: \(timer.stop())")
timer = ElapsedTimer()

distributedCountedColorArray.sort{ return $0.count < $1.count }
print("Shuffled Color array sort time: \(timer.stop())")
timer = ElapsedTimer()

distributedIntArray.sort()
print("NOnshuffled Int array sort time: \(timer.stop())")
timer = ElapsedTimer()

distributedCountedColorArray.sort{ return $0.count < $1.count }
print("Non shuffled Color array sort time: \(timer.stop())")

My array shuffle() method was pulled from this post. My ElapsedTimer simply wraps and uses CACurrentMediaTime() functions.

My question is why am I seeing this behavior? Especially when I am sorting the object array which should surely be using a general purpose sort. What kind of general purpose sorting algorithm is swift using? It surely can’t be one where the worst case and average case are the same like mergeSort.

Chris
  • 4,009
  • 3
  • 21
  • 52
AyBayBay
  • 1,726
  • 4
  • 18
  • 37
  • Asking "why" Swift chooses a certain implementation does not make a good objective question - rationale questions rarely do as objective supportive evidence is usually hard to find. Removing the extra cruft will make for a more tidy question. The objective question is "What sort algorithm does Swift use? (What performance can be expected (in xyz cases)?)" – user2864740 Dec 08 '16 at 03:10
  • 3
    I believe asking "why" to be a totally valid question. Understanding why an implementation is chosen is just as important if not more, all decisions should be justifiable. That being said I can re-word the question. – AyBayBay Dec 08 '16 at 03:13
  • 1
    Swift is open source, so if you really want to know what sort algorithm they are using, why don't you look and see? As for your actual question, "why would" is not a good fit for Stack Overflow; it's a matter of opinion why Apple did this or that. – matt Dec 08 '16 at 03:14
  • What is `CountedColor`? – Yuchen Dec 08 '16 at 03:15
  • @YuchenZhong let me edit the question/code snippet to include that – AyBayBay Dec 08 '16 at 03:18
  • @matt ok I understand what you are getting at, let me change the wording so its less of an opinionated/philosophical question. – AyBayBay Dec 08 '16 at 03:22
  • Possible duplicate of [Swift sorting algorithm implementation](http://stackoverflow.com/questions/27677026/swift-sorting-algorithm-implementation) – Code Different Dec 08 '16 at 03:30
  • And your implementation of `ElapsedTimer ` is? – Yuchen Dec 08 '16 at 03:52

3 Answers3

30

Swift uses Introsort. Looking at the source code we see that the chosen pivot is the first element. The wikipedia page on Introsort says:

(...), one of the critical operations is choosing the pivot: the element around which the list is partitioned. The simplest pivot selection algorithm is to take the first or the last element of the list as the pivot, causing poor behavior for the case of sorted or nearly sorted input.

Thus it is entirely predictable, given the implementation choice, that Swift's sorting performance is worst for sorted inputs.

I have built a complete benchmark for people who want to easily reproduce the OP's claims : https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/extra/swift/sort

For reference, the GNU ISO C++ standard library uses a median-of-3 pivot (as per the stl_algo.h header).

Daniel Lemire
  • 3,470
  • 2
  • 25
  • 23
  • Follow up: if choosing the first item as pivot is a bad choice, what would be a better implementation? Wikipedia mention median of 3 with some caveats. Seems to me it would be harder to come up with a worst case with median of 3 versus first item though. – Emmanuel Oga Dec 10 '16 at 01:22
21

In the Swift 5 evolution, the IntroSort algorithm was replaced with a modified version TimSort (implemented first in 2002 by Tim Peters for Python) in the 'sort()' method: https://github.com/apple/swift/blob/master/stdlib/public/core/Sort.swift

Soheil Novinfard
  • 1,358
  • 1
  • 16
  • 43
5

This is a duplicate of this question it seems: Swift sorting algorithm implementation

Furthermore the reason why it performs so much better when shuffled is probably just because when its shuffled its performance is not hitting the upper bound of NlogN. Sorting the sorted array probably gets closer to that upper limit so its still all in all the same sort. But I dont know this is just theory

Community
  • 1
  • 1
AyBayBay
  • 1,726
  • 4
  • 18
  • 37
  • this is incorrect, the performance of swift’s sort on an already sorted array larger than a few down elements is `O(n**2)`. On a *shuffled* array it comes down to `O(n*log(n))`. – taylor swift Jan 05 '17 at 17:16
  • By O(n**2) I assume you mean N squared. How can that be swifts sort is an intro sort, and introSorts worst case complexity is NlogN. – AyBayBay Jan 05 '17 at 17:18
  • As evidenced by the OP’s question, it appears swift’s sort does not behave like a proper introsort, or else it would not be vulnerable to ordered input. Perhaps the depth threshold is set very high that it predominantly exhibits quicksort behavior. – taylor swift Jan 05 '17 at 17:24
  • Potentially or the pivot selection is poor. – AyBayBay Jan 05 '17 at 17:29
  • pivot selection only changes which kind of patterned input the sort is vulnerable to. Though last I heard, swift was using the first element, which seems like an extraordinarily *bad* choice considering how common simple-sorted input is… – taylor swift Jan 05 '17 at 17:33
  • I believe last time I checked the open source code it was using the first element. The recursion depth threshold was not some aggregious value so it probably has to be their pivot selection. It would be easy to fix that I believe on their end – AyBayBay Jan 05 '17 at 17:57
  • in that case it might be good to bring it up on swift dev because i don't think it’s been mentioned, though i believe they're a bit tied up rn fixing crashers for 3.1 – taylor swift Jan 05 '17 at 17:59
  • Hmm will do thanks for the suggestion. Maybe I can submit a PR fixing it? I don't know how long it takes for them to approve code pull request and which route would be faster fixing myself or pinging them about it – AyBayBay Jan 05 '17 at 18:05
  • they are usually pretty quick but i don't know if they are accepting changes at this phase in the release cycle atm – taylor swift Jan 05 '17 at 18:07
  • Do they generally announce when they are accepting changes? – AyBayBay Jan 05 '17 at 18:20
  • The implementation is here if anyone is interested: https://github.com/apple/swift/blob/02e2bd5380af69948d2324b936bfc61e1454d8ea/stdlib/public/core/Sort.swift.gyb#L232-L301 – Steven Hepting Jun 23 '17 at 02:31