3

I have an unsorted List with 100 million entries. RAM is not an issue. I am not all together happy with the speed of List.Sort and was wondering whether there is a reliable technique to sort my list to utilize multiple processors.

I looked at Parallel.Invoke but can't figure out how to actually use it for sorting, perhaps it can't be. Any ideas?

AngryHacker
  • 59,598
  • 102
  • 325
  • 594
  • I don't know of an existing implementation in .NET, but algorithmically speaking, I know there's an O(log n) algorithm that's the equivalent of parallel mergesort. There's also an easier O(log^2 n) algorithm called [bitonic sort](http://en.wikipedia.org/wiki/Bitonic_sorter). – user541686 May 17 '12 at 18:40
  • Not sure how multiple CPU's come into play here. But if all other straightforward methods fail, I'd try to implement Binary tree – Val Bakhtin May 17 '12 at 18:44
  • You said that "RAM is not an issue". Is it not an issue to the point that you can fit all your data in memory, or do you have additional memory to fit your data twice? If you have twice the memory, you could sort in parallel `K` sub-lists of length `N/K`, where `N` is the total length and `K` is the number of CPUs, and then run a `K`-way merge on a single CPU. While you can do merges in place (search Kronrod's algorithm if you're curious) the implementation is too difficult to attempt as an optimization. – Sergey Kalinichenko May 17 '12 at 18:52
  • @dasblinkenlight I can fit everything into RAM 100 times over. – AngryHacker May 17 '12 at 19:07
  • @AngryHacker You can try the sort/merge approach then: splitting `K` ways is trivial, and merging `K` ways is only marginally more difficult than the classic two-way merging. – Sergey Kalinichenko May 17 '12 at 19:09
  • possible duplicate of [Parallel Sort Algorithm](http://stackoverflow.com/questions/1897458/parallel-sort-algorithm) – ataylor May 18 '12 at 18:42

1 Answers1

3

In another StackOverflow question, I found this:

private void QuicksortSequential<T>(T[] arr, int left, int right)  
where T : IComparable<T> 
{ 
    if (right > left) 
    { 
        int pivot = Partition(arr, left, right); 
        QuicksortSequential(arr, left, pivot - 1); 
        QuicksortSequential(arr, pivot + 1, right); 
    } 
} 

private void QuicksortParallelOptimised<T>(T[] arr, int left, int right)  
where T : IComparable<T> 
{ 
    const int SEQUENTIAL_THRESHOLD = 2048; 
    if (right > left) 
    { 
        if (right - left < SEQUENTIAL_THRESHOLD) 
        { 

            QuicksortSequential(arr, left, right); 
        } 
        else 
        { 
            int pivot = Partition(arr, left, right); 
            Parallel.Do( 
                () => QuicksortParallelOptimised(arr, left, pivot - 1), 
                () => QuicksortParallelOptimised(arr, pivot + 1, right)); 
        } 
    } 
} 

Different (and more complete, though not sure about 'better') implementation from answers to same question:

/// <summary> 
/// Parallel quicksort algorithm. 
/// </summary> 
public class ParallelSort 
{ 
    #region Public Static Methods 

    /// <summary> 
    /// Sequential quicksort. 
    /// </summary> 
    /// <typeparam name="T"></typeparam> 
    /// <param name="arr"></param> 
    public static void QuicksortSequential<T>(T [] arr) where T : IComparable<T> 
    { 
        QuicksortSequential(arr, 0, arr.Length - 1); 
    } 

    /// <summary> 
    /// Parallel quicksort 
    /// </summary> 
    /// <typeparam name="T"></typeparam> 
    /// <param name="arr"></param> 
    public static void QuicksortParallel<T>(T[] arr) where T : IComparable<T> 
    { 
        QuicksortParallel(arr, 0, arr.Length - 1); 
    } 

    #endregion 

    #region Private Static Methods 

    private static void QuicksortSequential<T>(T[] arr, int left, int right)  
        where T : IComparable<T> 
    { 
        if (right > left) 
        { 
            int pivot = Partition(arr, left, right); 
            QuicksortSequential(arr, left, pivot - 1); 
            QuicksortSequential(arr, pivot + 1, right); 
        } 
    } 

    private static void QuicksortParallel<T>(T[] arr, int left, int right)  
        where T : IComparable<T> 
    { 
        const int SEQUENTIAL_THRESHOLD = 2048; 
        if (right > left) 
        { 
            if (right - left < SEQUENTIAL_THRESHOLD) 
            { 
                QuicksortSequential(arr, left, right); 
            } 
            else 
            { 
                int pivot = Partition(arr, left, right); 
                Parallel.Invoke(new Action[] { delegate {QuicksortParallel(arr, left, pivot - 1);}, 
                                               delegate {QuicksortParallel(arr, pivot + 1, right);} 
                }); 
            } 
        } 
    } 

    private static void Swap<T>(T[] arr, int i, int j) 
    { 
        T tmp = arr[i]; 
        arr[i] = arr[j]; 
        arr[j] = tmp; 
    } 

    private static int Partition<T>(T[] arr, int low, int high)  
        where T : IComparable<T> 
    { 
        // Simple partitioning implementation 
        int pivotPos = (high + low) / 2; 
        T pivot = arr[pivotPos]; 
        Swap(arr, low, pivotPos); 

        int left = low; 
        for (int i = low + 1; i <= high; i++) 
        { 
            if (arr[i].CompareTo(pivot) < 0) 
            { 
                left++; 
                Swap(arr, i, left); 
            } 
        } 

        Swap(arr, low, left); 
        return left; 
    } 

    #endregion 
}
Community
  • 1
  • 1
Tim
  • 14,999
  • 1
  • 45
  • 68
  • Do you have an implementation of the Partition method? Also my copy of Parallel class (.net 4) does not have a Do method. – AngryHacker May 17 '12 at 18:43
  • In the other question I linked to, there's another implementation that shows `Partition` and uses `Parallel.Invoke` rather than `Parallel.Do`. That might work for you instead. – Tim May 17 '12 at 18:51
  • That is odd, the parallel implementation takes double the time on a Dual Core box. Let me try it on a 24 CPU box, see if any improvements there. – AngryHacker May 17 '12 at 19:16
  • Just an FYI, once I got to an 8 CPU box, the parallel solution started to be slightly faster than a sequential one. – AngryHacker May 17 '12 at 21:33
  • It'd a good idea to not only have a threshold for the parallel version, but also change to InsertionSort for the serial version under some specific threshold (usually something in the order of 8-20 elements). That should give another speedup compared to the library version (which certainly does that too) – Voo May 21 '12 at 13:30