Fast sort of normal distributed doubles

Question

We have an array of objects. Each object has double Value. The array should be sorted by this Value. The array parameters are:

The range is 1 - 10 000 elements. 100 - 5000 most of the time. >10 000 is REALLY unlikely
Values has a distribution close to normal
Values are inserted only once and not changing afterwards (no re-sort of almost sorted array)
Have many sorts of such data samples

Now we use OrderBy and do something like this:

public class Item
{
    double Value;
    //... some else data fields
}

List<Item> items;           // fill items
items.Sort(p=>p.Value);  // sort

It is known that:

the List.Sort (same as Array.Sort) is an implementation of Quick sort algorithm.
Quick sort is most appropriate for Uniform distribution of doubles
OrderBy implementation of sort looks worse than List.Sort for our case.

But still benchmarks shows that this sorting eats 95% of our software processing time.

Is there faster implementation of sorting for this particular case?

If you are going to fill a list, why not just use insertion sorting? — leppie, Feb 06 '12 at 06:01
This link may be useful: http://codegolf.stackexchange.com/questions/2774/contest-fastest-way-to-sort-a-big-array-of-gaussian-distributed-data — templatetypedef, Feb 06 '12 at 06:02
Calling OrderBy does not sort the list in place. It creates an enumerator that returns the elements of the list in sorted order. Your sample code is therefore incorrect. Also, the "looks slightly better" link points to a performance test that compares `List.Sort` against `Enumerable.OrderBy` on *already-sorted input*, so that test is quite likely to be irrelevant to your use case. — phoog, Feb 06 '12 at 06:03
@templatetypedef Question looks very similar. I will reproduce the answer to C# code and try benchmark. — MajesticRa, Feb 06 '12 at 06:12
@phoog there certainly an error in the example. Thank you. There are a case in "List.Sort against Enumerable.OrderBy" with array of random strings. Anyway difference is very slight. — MajesticRa, Feb 06 '12 at 06:12
@MajesticRa the case with random strings, like the other cases, *sorts the list before starting the stopwatch to time the sort algorithms!* So even though the strings were generated randomly, the timings represent sorting a list whose elements are already sorted. I added an answer to that question, with the result that OrderBy can be 25% slower than List.Sort. — phoog, Feb 06 '12 at 06:56
@MajesticRa: User a sorted list. You say the data is 'almost' sorted, so filling a sorted list should not be that expensive. — leppie, Feb 06 '12 at 06:57
@leppie I think "no resorting of almost-sorted array" was not meant to imply that the input data is nearly in the correct output order. Rather, it seems to be underscoring the fact that objects' `Value` properties will not change while the sort algorithm is running. — phoog, Feb 06 '12 at 07:05
@leppie "no resorting of almost-sorted array" means the array is ONE time filled. ONE time sorted. And is NOT changed after that. There is NO situation when afterwards you change one or two values and re-sort the array (which would be almost sorted in that situation). — MajesticRa, Feb 06 '12 at 07:11

score 3 · Answer 1 · answered Feb 06 '12 at 07:53

The problem with evaluating sorting algorithms is that there are many factors that have an effect on the outcome. I wouldn't trust the website that you gave a lot, as it is probably more benchmarking the javascript engine, the visualization and the javascript implementation than the actual sorting algorithm.

Heapsort has theoretically nice properties, but is not able to exploit modern CPU optimizations much.

QuickSort is theoretically worse, but common tricks such a median-of-3 and median-of-9 pivot elements make the bad cases really unlikely, and by linearly processing the array it is quite well optimizable by the CPU.

MergeSort is good when you do not need in-place sorting. To use it in-place and optimize it for presorted and almost-presorted data is not trivial, but you might have a look at Tim sort, which is used by Python and Java7.

There is no general answer such as "QuickSort is bad for gaussian distributed data". There is this huge gap between theory and practise. In theory, Insertion Sort and QuickSort are worse than HeapSort. In reality, a well-optimized QuickSort in most situations is hard to beat, in particular because it is nice to optimize and benefits from CPU caching. Tim Sort is not a plain mergesort. It actually is a hybrid with InsertionSort to optimize for the common case of already sorted runs of objects.

Secondly, and this should be rather obvious, none of the mentioned sorting algorithms actually computes the difference of two objects. As such, any distribution that doesn't produce lots of duplicates will look the same to them. All they see is "less than, equals, larger than". So the only difference between distributions is how many objects are equal! In fact, only bucket sorting algorithms (e.g. radix sort) should care about the object distribution, as they use the actual values beyond a <=> comparison.

For your particular case, you need to check how your list is organized. A linked list is really bad for sorting. In fact Java if I recall correctly, will convert just about any collection to a native array for sorting, then rebuild the collection afterwards. Secondly, the notion of p=>p.Value is pretty, but may come at quite some cost, too. This may result in various additional objects to be created, managed, and garbage collected.

The first thing you should try is to check whether e.g. a full comparator is faster than the lambda syntax notion. Look at the memory management. Most likely, this is where the actual work happens, copying and converting doubles around unnecessarily.

For example, C# might build an inverse mapping for your dataset "double -> index", then sort this array, then use the mapping to sort your data. This were good, if your lambda function were incredibly expensive and should only be computed once.

user492238 · Answer 2 · 2012-02-19T21:46:09.413

Because of the O(n^2) effort and a 5000 element vector, it is worth a try to

Convert your List to a plain double array,
Use some specialized sorting library to sort the array
Fetch the sort order indices from the sort and
Reorder the items of your List according to the indices

This is feasible, since it adds only O(2n) to the effort, hence it may be neglegible and pays off, if the overhead is less than the gain due to much faster comparison.

if I find the time, I'll post an example here.

@Edit: I have tested this for demonstration (and my own interest). The following test does compare the List.Sort() against the copy-sort-copy back approach. Latter was done with ILNumerics quick sort. Both versions were run after each other for all length (means: not interleaved).

Disclaimer: This is only to get a rough overview, if the method would pay of. On my computer (Core i5, 2.4 GHz, 3MB L3 Data Cache) it does. But the break-even point is hard to determine. Also, as always with such quick and dirty performance measures - a whole bunch of influences are left out. Probably the most important: cache issues due to multiple copies which in a real production environment may are not necessary.

The code:

namespace ConsoleApplication1 {
unsafe class Program : ILNumerics.ILMath {

static void Main(string[] args) {
    int numItems = 0, repet = 20000; 

    Stopwatch sw01 = new Stopwatch();
    // results are collected in a dictionary: 
    // key: list length
    // value: tuple with times taken by ILNumerics and List.Sort()
    var results = new Dictionary<int, Tuple<float,float>>();
    // the comparer used for List.Sort() see below 
    ItemComparer comparer = new ItemComparer();

    // run test for copy-sort-copy back via ILNumerics
    for (numItems = 500; numItems < 50000; numItems = (int)(numItems * 1.3)) {

        Console.Write("\r measuring: {0}", numItems);  
        long ms = 0;
        List<Item> a = makeData(numItems);
        for (int i = 0; i < repet; i++) {
            sw01.Restart();
            List<Item> b1 = fastSort(a);
            sw01.Stop();
            ms += sw01.ElapsedMilliseconds;
        }
        results.Add(numItems,new Tuple<float,float>((float)ms / repet, 0f)); 
    }

    // run test using the straightforward approach, List.Sort(IComparer)
    for (numItems = 500; numItems < 50000; numItems = (int)(numItems * 1.3)) {

        Console.Write("\r measuring: {0}", numItems);  
        List<Item> a = makeData(numItems);
        long ms = 0;
        for (int i = 0; i < repet; i++) {
            List<Item> copyList = new List<Item>(a);
            sw01.Restart();
            copyList.Sort(comparer);
            sw01.Stop();
            ms += sw01.ElapsedMilliseconds;
        }
        results[numItems] = new Tuple<float, float>(results[numItems].Item1, (float)ms / repet); 
    }

    // Print results
    Console.Clear(); 
    foreach (var v in results) 
        Console.WriteLine("Length: {0} | ILNumerics/CLR: {1} / {2} ms", v.Key, v.Value.Item1, v.Value.Item2);
    Console.ReadKey(); 
}
public class Item {
    public double Value;
    //... some else data fields
}

public class ItemComparer : Comparer<Item> {
    public override int Compare(Item x, Item y) {
        return (x.Value > y.Value)  ? 1 
             : (x.Value == y.Value) ? 0 : -1;
    }
}


public static List<Item> makeData(int n) {
    List<Item> ret = new List<Item>(n); 
    using (ILScope.Enter()) {
        ILArray<double> A = rand(1,n);
        double[] values = A.GetArrayForRead(); 
        for (int i = 0; i < n; i++) {
            ret.Add(new Item() { Value = values[i] }); 
        }
    }
    return ret; 
}

public static List<Item> fastSort(List<Item> unsorted) {
    //double [] values = unsorted.ConvertAll<double>(item => item.Value).ToArray(); 

    //// maybe more efficient? safes O(n) run 
    //double[] values = new double[unsorted.Count];
    //for (int i = 0; i < values.Length; i++) {
    //    values[i] = unsorted[i].Value;
    //}
    using (ILScope.Enter()) {
        // convert incoming
        ILArray<double> doubles = zeros(unsorted.Count);
        double[] doublesArr = doubles.GetArrayForWrite();
        for (int i = 0; i < doubles.Length; i++) {
            doublesArr[i] = unsorted[i].Value;
        }

        // do fast sort 
        ILArray<double> indices = empty();
        doubles = sort(doubles, Indices: indices);

        // convert outgoing
        List<Item> ret = new List<Item>(unsorted.Count); 
        foreach (double i in indices) ret.Add(unsorted[(int)i]); 
        return ret; 
    }
}
}
}

This gave the following output:

Length: 500 | ILNumerics / List.Sort: 0,00395 / 0,0001 ms
Length: 650 | ILNumerics / List.Sort: 0,0003 / 0,0001 ms
Length: 845 | ILNumerics / List.Sort: 0,00035 / 0,0003 ms
Length: 1098 | ILNumerics / List.Sort: 0,0003 / 0,00015 ms
Length: 1427 | ILNumerics / List.Sort: 0,0005 / 0,00055 ms
Length: 1855 | ILNumerics / List.Sort: 0,00195 / 0,00055 ms
Length: 2000 | ILNumerics / List.Sort: 0,00535 / 0,0006 ms
Length: 2600 | ILNumerics / List.Sort: 0,0037 / 0,00295 ms
Length: 3380 | ILNumerics / List.Sort: 0,00515 / 0,0364 ms
Length: 4394 | ILNumerics / List.Sort: 0,0051 / 1,0015 ms
Length: 4500 | ILNumerics / List.Sort: 0,1136 / 1,0057 ms
Length: 5850 | ILNumerics / List.Sort: 0,2871 / 1,0047 ms
Length: 7605 | ILNumerics / List.Sort: 0,5015 / 2,0049 ms
Length: 9886 | ILNumerics / List.Sort: 1,1164 / 2,0793 ms
Length: 12851 | ILNumerics / List.Sort: 1,4236 / 3,6335 ms
Length: 16706 | ILNumerics / List.Sort: 1,6202 / 4,9506 ms
Length: 21717 | ILNumerics / List.Sort: 2,3417 / 6,1871 ms
Length: 28232 | ILNumerics / List.Sort: 3,4038 / 8,7888 ms
Length: 36701 | ILNumerics / List.Sort: 4,4406 / 12,1311 ms
Length: 47711 | ILNumerics / List.Sort: 5,7884 / 16,1002 ms

Here, the break-even appears to lay around 4000 elements. Larger arrays are always faster sorted by the copy-sort-copy approach by a factor of roughly 3. I suppose for smaller arrays it may pays of - or may it doesn't. The numbers gathered here are to unreliable. I assume, for small lists, the sorting time is masked by some other issues like memory management (GC) and so on. Maybe somebody here got more ideas how to explain this .

Also strange is the jump-up in execution time for List.Sort at length of 4000 items. No idea if List.Sort here switches to another (worse) implementation?

Regarding the question, it appears to pay of to copy the items, sort in a plain array and copy them back if needed. Depending on your hardware, the break even may shift up or down. So as always: profile your implementation!

score 2 · Answer 3 · answered Feb 06 '12 at 06:21

One solution could be to create bins from 0 to 1 at the steps of 0.001 (or any arbitrary value p. Note that expected number in each bin is p*N). Now, for each number in the array calculate cumulative probability (cumulative probability of -infinity is 0, 0 is 0.5, and infinity is 1.0) and put that number into corresponding bin. Sort each bin separately using your favorite sorting algorithm and merge the results. If you select p to be such that p*n = k (k is a constant), this algorithm is, in best and average case, O(Nlogk).

Fast sort of normal distributed doubles

3 Answers3