9

I've implemented a version of quicksort in C#, and performed some quick benchmarks to compare against the C# List<T>.Sort. I have found my implementation to be significantly slower than the library version. I am trying to understand why. Here are some rough benchmark numbers. For input I used a list of integers that were generated randomly (uniform) with few duplicate elements. Note that all benchmark times are averages of several runs.

100,000 elements
My code         0.096 seconds
List<T>.Sort    0.011 seconds

1,000,000 elements
My code         1.10 seconds
List<T>.Sort    0.14 seconds

My code is below. Here is a list of optimizations I've tried and their results:

  • In-line - I've tried to in-line my Swap and ChoosePivotIndex functions I see around a 10% improvement.
  • Pivot selection - I know that I'm using a naive pivot selection method. I've tried using the median of three random elements as well. This does not yield much improvement. I'm guessing this is because the input data I'm using to benchmark is uniformly random.
  • Parallelism - I've tried making the recursive Partition calls in parallel. This seems to actually increase execution time.
  • Special casing small input - I've tried switching to insertion sort for small input (as List<T>.Sort claims to do). That yields around a 20% improvement.

With a combination of these optimizations, I've been able to get my code down to

100,000 elements -   0.062 seconds
1,000,000 elements - 0.740 seconds 

This is still significantly slower than the library Sort. Is there any obvious in my code that would explain the performance gap, or is the remaining 70-80 percent gap from more small tweaks? Note that the code below is my base unoptimized verison

public class Quicksorter<T> where T : IComparable<T>
{
    protected Random random;
    public Quicksorter()
    {
        random = new Random();
    }

    public void Sort(IList<T> items)
    {
        Partition(items, 0, items.Count-1);
    }

    private void Partition(IList<T> items, int startIndex, int endIndex)
    {
        if (startIndex >= endIndex)
            return;

        int pivotIndex = ChoosePivotIndex(items, startIndex, endIndex);
        T pivot = items[pivotIndex];

        // swap pivot and first element
        Swap(items, startIndex, pivotIndex);

        int partitionIndex = startIndex + 1;
        for (int frontierIndex = partitionIndex; frontierIndex <= endIndex; frontierIndex++)
        {
            if (items[frontierIndex].CompareTo(pivot) < 0)
            {
                Swap(items, frontierIndex, partitionIndex);
                partitionIndex++;
            }
        }
        // put pivot back
        items[startIndex] = items[partitionIndex-1];
        items[partitionIndex - 1] = pivot;

        // recursively sort left half
        Partition(items, startIndex, partitionIndex - 2);
        // recursively sort right half
        Partition(items, partitionIndex, endIndex);
    }

    protected virtual int ChoosePivotIndex(IList<T> items, int startIndex, int endIndex)
    {
        return random.Next(startIndex, endIndex);
    }

    private void Swap(IList<T> items, int aIndex, int bIndex)
    {
        T temp = items[aIndex];
        items[aIndex] = items[bIndex];
        items[bIndex] = temp;
    }
}
mattnedrich
  • 7,577
  • 9
  • 39
  • 45
  • just in case - you do build your code in release configuration with optimisations on, right? – Anri Feb 17 '14 at 00:47
  • List.Sort probably utilizes Array.Sort which can use a quicksort or a heapsort for more than trivial inputs. Both quicksort and heapsort performance is dependent on the range and order of input values, so you would have to run it on many different inputs to get real measurement in terms of time. – Keith Payne Feb 17 '14 at 00:55
  • Have you tried modifying the Partition method to accept an Array of T, instead of a List of T... and do the conversion in your Sort method? or does that defeat the purpose? – Kaushal De Silva Feb 17 '14 at 01:32
  • @KaushalDeSilva, is there any motivation/intuition behind doing that? – mattnedrich Feb 17 '14 at 04:02
  • @mattnedrich I have no proof, but firstly I assumed a fixed-length list would be better than a dynamic-length list; secondly, I assumed that maybe an array (due to having a strict length and data type) is stored differently in memory, allowing for faster access. Again, I have no proof. – Kaushal De Silva Feb 17 '14 at 04:38
  • @mattnedrich Also, this question looks at array vs list: http://stackoverflow.com/questions/454916/performance-of-arrays-vs-lists ... but the tests that are being posted are for integers, so may be irrelevant in your case. – Kaushal De Silva Feb 17 '14 at 04:41
  • `List` is implemented with an array underneath. When you call `List.Sort`, it turns around and calls `Array.Sort`. There is no difference in sorting time between arrays and lists (beyond that one extra function call). Modifying `Partition` to take an array would end up requiring that the Quicksort method create an array from the passed `IList`--at the cost of O(n) extra space--and also would require copying it back to the `IList` after the sort is done. That is potentially very expensive. – Jim Mischel Feb 17 '14 at 14:22

1 Answers1

10

Because the .NET Framework has special-case sorting methods for integers, strings, and other built-in types. They don't incur the cost of calling delegates for comparisons, etc. If you're comparing your sort using built-in types, the library sort is typically going to be much, much faster.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • 3
    `List.Sort` calls `Array.TrySZSort(Array keys, Array items, int left, int right)` which is implemented in native code and handles optimized quicksort for all the primitive value types. [reference](http://bytes.com/topic/c-sharp/answers/247544-integer-sorting-c) – Alex Wiese Feb 17 '14 at 01:02
  • So if I compare using something other than built-in types I should see similar results (or at least more than I'm seeing currently)? Also, along with what @KeithPayne said, is this documented somewhere? – mattnedrich Feb 17 '14 at 01:10
  • @KeithPayne: Not documented anywhere I know of except in the reference source. See http://referencesource.microsoft.com/. – Jim Mischel Feb 17 '14 at 03:09
  • @mattnedrich: If you're sorting complex types and passing a comparison delegate to `List.Sort`, then you can potentially do better than the .NET sort. This is going to be version-specific, though. .NET 4.0 had a pretty well optimized median of three Quicksort. .NET 4.5 uses Introsort, the Quicksort portion of which isn't quite as elaborate as what was in .NET 4.0. See http://zimbry.blogspot.com/2012/01/nets-sort-is-not-secure-dont-use-it.html for a Quicksort that outperformed the .NET Quicksort. It's unnecessary now, of course, because .NET 4.5 uses Introsort. – Jim Mischel Feb 17 '14 at 03:19
  • 1
    The relevant code is in http://www.dotnetframework.org/default.aspx/DotNET/DotNET/8@0/untmp/whidbey/REDBITS/ndp/clr/src/BCL/System/Array@cs/2/Array@cs. Search for "TrySZSort". – Jim Mischel Feb 17 '14 at 03:25