I have been picking and probing at Swift Array's sort function and I was surprised to find that in-place sorting an array of 500,000 integers was remarkably faster than sorting an array of 500,000 tuples (5x faster). I was surprised by this finding because after asking a previous question on Stack overflow and digging through Swift open source, it seems that Swift simply uses an introSort() for its sorting algorithm. I tried to parse through the code to see if it handled integer sorting in any different/special way but I could not find anything. Time complexity wise IntroSort() performs NlogN avg and worst case so my tuple sort and int sort should behave the same. If you do the math the 5x speed up begins to look like linear vs NlogN time complexity.
This got me thinking, maybe Swift IS using a linear-time integer sort algorithm to handle integer sorting and maybe I just missed it. Now I am NOT an expert in integer sorting but it seems like the most viable candidates for integer sorting algorithms are pigeonhole sort, countingSort, and RadixSort. Right off the bat I ruled out Radix sort because its time complexity is O(wn) where w is word/int size. This means the (logN) term in a general purpose sort needs to be greater than w which would necessitate us to be sorting a ridiculously large list. Looking at the performance of the int sort vs tuple sort, I deduced radix sort couldnt possibly be in use here.
By order of elimination this left CountingSort and Pigeonhole Sort. They both behave worst case O(N + k) where N is the number of elements and k is the range of values. So to determine whether Swift's algorithm was one of these sorts, I thought it sufficed to compare the sort of 500,000 integers with smaller key values with the sorting of 500,000 integers with a huge range of keys. For the first array I picked the key range of [1, 500,000]. For the second array I picked the key range of [1, 6,000,500,000]. You would expect the second array's sort to take significantly longer than the first if it used either counting sort or pigeonhole sort (just do the math) but it did NOT. To my surprise the performance was the exact same! Below is my sample code and performance results:
func testTupleSortVsIntegerSort()
{
var intArray = [Int]()
for value in 1..<500000 {
intArray.append(value)
}
var tupleArray = intArray.map{($0, "sentinelValue")}
intArray.shuffle()
tupleArray.shuffle()
var startTime = CFAbsoluteTimeGetCurrent()
intArray.sort()
var endTime = CFAbsoluteTimeGetCurrent()
print("Integer sort took \(endTime - startTime) seconds")
startTime = CFAbsoluteTimeGetCurrent()
tupleArray.sort {
return $0.0 < $1.0
}
endTime = CFAbsoluteTimeGetCurrent()
print("Tuple sort took \(endTime - startTime) seconds")
}
func testIntegerSortWithSmallKeyRangeVsVeryLargeKeyRange()
{
var intArrayWithSmallKeys = [Int]()
for value in 1..<500000 {
intArrayWithSmallKeys.append(value)
}
var intArrayWithLargeKeys: [Int] = intArrayWithSmallKeys.map {
if $0 < 250000 {
return $0
} else if $0 < 350000 {
return $0 + 50000000
} else {
return $0 + 6000000000
}
}
intArrayWithSmallKeys.shuffle()
intArrayWithLargeKeys.shuffle()
var startTime = CFAbsoluteTimeGetCurrent()
intArrayWithSmallKeys.sort()
var endTime = CFAbsoluteTimeGetCurrent()
print("Integer sort with small keys took \(endTime - startTime) seconds")
startTime = CFAbsoluteTimeGetCurrent()
intArrayWithLargeKeys.sort()
endTime = CFAbsoluteTimeGetCurrent()
print("Integer sort with large keys took \(endTime - startTime) seconds")
}
Test Case '-[ColorExtractorTests.GeneralPerformanceTest testTupleSortVsIntegerSort]' started.
Integer sort took 0.659438014030457 seconds
Tuple sort took 3.40226799249649 seconds
Test Case '-[ColorExtractorTests.GeneralPerformanceTest testIntegerSortWithSmallKeyRangeVsVeryLargeKeyRange]' started.
Integer sort with small keys took 0.665884971618652 seconds
Integer sort with large keys took 0.669007003307343 seconds
What is going on here? Is there some magical integer sorting algorithm I do not know of? Is my understanding of these sorting algorithms wrong or could I have some flaw in my analysis?