2

I have been picking and probing at Swift Array's sort function and I was surprised to find that in-place sorting an array of 500,000 integers was remarkably faster than sorting an array of 500,000 tuples (5x faster). I was surprised by this finding because after asking a previous question on Stack overflow and digging through Swift open source, it seems that Swift simply uses an introSort() for its sorting algorithm. I tried to parse through the code to see if it handled integer sorting in any different/special way but I could not find anything. Time complexity wise IntroSort() performs NlogN avg and worst case so my tuple sort and int sort should behave the same. If you do the math the 5x speed up begins to look like linear vs NlogN time complexity.

This got me thinking, maybe Swift IS using a linear-time integer sort algorithm to handle integer sorting and maybe I just missed it. Now I am NOT an expert in integer sorting but it seems like the most viable candidates for integer sorting algorithms are pigeonhole sort, countingSort, and RadixSort. Right off the bat I ruled out Radix sort because its time complexity is O(wn) where w is word/int size. This means the (logN) term in a general purpose sort needs to be greater than w which would necessitate us to be sorting a ridiculously large list. Looking at the performance of the int sort vs tuple sort, I deduced radix sort couldnt possibly be in use here.

By order of elimination this left CountingSort and Pigeonhole Sort. They both behave worst case O(N + k) where N is the number of elements and k is the range of values. So to determine whether Swift's algorithm was one of these sorts, I thought it sufficed to compare the sort of 500,000 integers with smaller key values with the sorting of 500,000 integers with a huge range of keys. For the first array I picked the key range of [1, 500,000]. For the second array I picked the key range of [1, 6,000,500,000]. You would expect the second array's sort to take significantly longer than the first if it used either counting sort or pigeonhole sort (just do the math) but it did NOT. To my surprise the performance was the exact same! Below is my sample code and performance results:

 func testTupleSortVsIntegerSort()
    {
        var intArray = [Int]()
        for value in 1..<500000 {
            intArray.append(value)
        }

        var tupleArray = intArray.map{($0, "sentinelValue")}
        intArray.shuffle()
        tupleArray.shuffle()

        var startTime = CFAbsoluteTimeGetCurrent()
        intArray.sort()
        var endTime = CFAbsoluteTimeGetCurrent()
        print("Integer sort took \(endTime - startTime) seconds")

        startTime = CFAbsoluteTimeGetCurrent()
        tupleArray.sort {
            return $0.0 < $1.0
        }
        endTime = CFAbsoluteTimeGetCurrent()
        print("Tuple sort took \(endTime - startTime) seconds")

    }

    func testIntegerSortWithSmallKeyRangeVsVeryLargeKeyRange()
    {

        var intArrayWithSmallKeys = [Int]()
        for value in 1..<500000 {
            intArrayWithSmallKeys.append(value)
        }

        var intArrayWithLargeKeys: [Int] = intArrayWithSmallKeys.map {
            if $0 < 250000 {
                return $0
            } else if $0 < 350000 {
                return $0 + 50000000
            } else {
                return $0 + 6000000000
            }
        }

        intArrayWithSmallKeys.shuffle()
        intArrayWithLargeKeys.shuffle()

        var startTime = CFAbsoluteTimeGetCurrent()
        intArrayWithSmallKeys.sort()
        var endTime = CFAbsoluteTimeGetCurrent()
        print("Integer sort with small keys took \(endTime - startTime) seconds")

        startTime = CFAbsoluteTimeGetCurrent()
        intArrayWithLargeKeys.sort()
        endTime = CFAbsoluteTimeGetCurrent()
        print("Integer sort with large keys took \(endTime - startTime) seconds")

    }

Test Case '-[ColorExtractorTests.GeneralPerformanceTest testTupleSortVsIntegerSort]' started.
Integer sort took 0.659438014030457 seconds
Tuple sort took 3.40226799249649 seconds

Test Case '-[ColorExtractorTests.GeneralPerformanceTest testIntegerSortWithSmallKeyRangeVsVeryLargeKeyRange]' started.
Integer sort with small keys took 0.665884971618652 seconds
Integer sort with large keys took 0.669007003307343 seconds

What is going on here? Is there some magical integer sorting algorithm I do not know of? Is my understanding of these sorting algorithms wrong or could I have some flaw in my analysis?

AyBayBay
  • 1,726
  • 4
  • 18
  • 37
  • Swift uses introsort, compare [Swift sorting algorithm implementation](http://stackoverflow.com/questions/27677026/swift-sorting-algorithm-implementation). – Martin R Dec 08 '16 at 22:01
  • @MartinR in my question I state that I already had a look. I know it seems to be introSort but my findings state otherwise. If it were using introSort the performance should be the same for my scenario I posted but its not. Thats why I am wondering how Swift is sorting INT's in particular, and if anyone knows if it handles them differently. I parsed the code and did not see Swift handling integer sorting as a special case so I am perplexed. – AyBayBay Dec 08 '16 at 22:03
  • Just a note - you can have two algorithms, both with `O(n*log n)` time complexity and still one can be 5 times faster than the other on 500,000 elements. – Honza Dejdar Dec 08 '16 at 22:06
  • I did not read your question correctly, I apologize. – I don't think there is a separate algorithm for integers. But for example comparing and swapping tuples could be slower. Also your tuples contain a string, so that refcounting is done. You could try to profile with Instruments to see where the time is spent. – Martin R Dec 08 '16 at 22:13
  • Also sorting with a custom closure seems to be much slower, try `intArray.sort { $0 < $1 }`. – Martin R Dec 08 '16 at 22:15
  • @MartinR I changed the second value from a string to an int and that sped up the routine considerably. Now the int sort is 0.7 and the tuple Sort is 0.9 Also what is the difference between my sort function and yours? I cant tell. I switched it to use your closure and it sped up from 0.92 to 0.88 (so slight speed up that would probably expand as sheer data sorted increased) – AyBayBay Dec 08 '16 at 22:18
  • My *guess* would be that the compiler optimizes the code better when sorting with the built-in `<` operator for integers, whereas calling a custom closure takes more time. – Martin R Dec 08 '16 at 22:23

1 Answers1

0

Please see responses to question from Martin R and Honza Denjar. These insites helped me realize what was going on. The data you sort heavily impacts performance not JUST number of elements being sorted. Sometimes this lesson gets forgotten.

AyBayBay
  • 1,726
  • 4
  • 18
  • 37