Because of the O(n^2) effort and a 5000 element vector, it is worth a try to
- Convert your List to a plain double array,
- Use some specialized sorting library to sort the array
- Fetch the sort order indices from the sort and
- Reorder the items of your List according to the indices
This is feasible, since it adds only O(2n) to the effort, hence it may be neglegible and pays off, if the overhead is less than the gain due to much faster comparison.
if I find the time, I'll post an example here.
@Edit: I have tested this for demonstration (and my own interest). The following test does compare the List.Sort() against the copy-sort-copy back approach. Latter was done with ILNumerics quick sort. Both versions were run after each other for all length (means: not interleaved).
Disclaimer: This is only to get a rough overview, if the method would pay of. On my computer (Core i5, 2.4 GHz, 3MB L3 Data Cache) it does. But the break-even point is hard to determine. Also, as always with such quick and dirty performance measures - a whole bunch of influences are left out. Probably the most important: cache issues due to multiple copies which in a real production environment may are not necessary.
The code:
namespace ConsoleApplication1 {
unsafe class Program : ILNumerics.ILMath {
static void Main(string[] args) {
int numItems = 0, repet = 20000;
Stopwatch sw01 = new Stopwatch();
// results are collected in a dictionary:
// key: list length
// value: tuple with times taken by ILNumerics and List.Sort()
var results = new Dictionary<int, Tuple<float,float>>();
// the comparer used for List.Sort() see below
ItemComparer comparer = new ItemComparer();
// run test for copy-sort-copy back via ILNumerics
for (numItems = 500; numItems < 50000; numItems = (int)(numItems * 1.3)) {
Console.Write("\r measuring: {0}", numItems);
long ms = 0;
List<Item> a = makeData(numItems);
for (int i = 0; i < repet; i++) {
sw01.Restart();
List<Item> b1 = fastSort(a);
sw01.Stop();
ms += sw01.ElapsedMilliseconds;
}
results.Add(numItems,new Tuple<float,float>((float)ms / repet, 0f));
}
// run test using the straightforward approach, List.Sort(IComparer)
for (numItems = 500; numItems < 50000; numItems = (int)(numItems * 1.3)) {
Console.Write("\r measuring: {0}", numItems);
List<Item> a = makeData(numItems);
long ms = 0;
for (int i = 0; i < repet; i++) {
List<Item> copyList = new List<Item>(a);
sw01.Restart();
copyList.Sort(comparer);
sw01.Stop();
ms += sw01.ElapsedMilliseconds;
}
results[numItems] = new Tuple<float, float>(results[numItems].Item1, (float)ms / repet);
}
// Print results
Console.Clear();
foreach (var v in results)
Console.WriteLine("Length: {0} | ILNumerics/CLR: {1} / {2} ms", v.Key, v.Value.Item1, v.Value.Item2);
Console.ReadKey();
}
public class Item {
public double Value;
//... some else data fields
}
public class ItemComparer : Comparer<Item> {
public override int Compare(Item x, Item y) {
return (x.Value > y.Value) ? 1
: (x.Value == y.Value) ? 0 : -1;
}
}
public static List<Item> makeData(int n) {
List<Item> ret = new List<Item>(n);
using (ILScope.Enter()) {
ILArray<double> A = rand(1,n);
double[] values = A.GetArrayForRead();
for (int i = 0; i < n; i++) {
ret.Add(new Item() { Value = values[i] });
}
}
return ret;
}
public static List<Item> fastSort(List<Item> unsorted) {
//double [] values = unsorted.ConvertAll<double>(item => item.Value).ToArray();
//// maybe more efficient? safes O(n) run
//double[] values = new double[unsorted.Count];
//for (int i = 0; i < values.Length; i++) {
// values[i] = unsorted[i].Value;
//}
using (ILScope.Enter()) {
// convert incoming
ILArray<double> doubles = zeros(unsorted.Count);
double[] doublesArr = doubles.GetArrayForWrite();
for (int i = 0; i < doubles.Length; i++) {
doublesArr[i] = unsorted[i].Value;
}
// do fast sort
ILArray<double> indices = empty();
doubles = sort(doubles, Indices: indices);
// convert outgoing
List<Item> ret = new List<Item>(unsorted.Count);
foreach (double i in indices) ret.Add(unsorted[(int)i]);
return ret;
}
}
}
}
This gave the following output:
Length: 500 | ILNumerics / List.Sort: 0,00395 / 0,0001 ms
Length: 650 | ILNumerics / List.Sort: 0,0003 / 0,0001 ms
Length: 845 | ILNumerics / List.Sort: 0,00035 / 0,0003 ms
Length: 1098 | ILNumerics / List.Sort: 0,0003 / 0,00015 ms
Length: 1427 | ILNumerics / List.Sort: 0,0005 / 0,00055 ms
Length: 1855 | ILNumerics / List.Sort: 0,00195 / 0,00055 ms
Length: 2000 | ILNumerics / List.Sort: 0,00535 / 0,0006 ms
Length: 2600 | ILNumerics / List.Sort: 0,0037 / 0,00295 ms
Length: 3380 | ILNumerics / List.Sort: 0,00515 / 0,0364 ms
Length: 4394 | ILNumerics / List.Sort: 0,0051 / 1,0015 ms
Length: 4500 | ILNumerics / List.Sort: 0,1136 / 1,0057 ms
Length: 5850 | ILNumerics / List.Sort: 0,2871 / 1,0047 ms
Length: 7605 | ILNumerics / List.Sort: 0,5015 / 2,0049 ms
Length: 9886 | ILNumerics / List.Sort: 1,1164 / 2,0793 ms
Length: 12851 | ILNumerics / List.Sort: 1,4236 / 3,6335 ms
Length: 16706 | ILNumerics / List.Sort: 1,6202 / 4,9506 ms
Length: 21717 | ILNumerics / List.Sort: 2,3417 / 6,1871 ms
Length: 28232 | ILNumerics / List.Sort: 3,4038 / 8,7888 ms
Length: 36701 | ILNumerics / List.Sort: 4,4406 / 12,1311 ms
Length: 47711 | ILNumerics / List.Sort: 5,7884 / 16,1002 ms
Here, the break-even appears to lay around 4000 elements. Larger arrays are always faster sorted by the copy-sort-copy approach by a factor of roughly 3. I suppose for smaller arrays it may pays of - or may it doesn't. The numbers gathered here are to unreliable. I assume, for small lists, the sorting time is masked by some other issues like memory management (GC) and so on. Maybe somebody here got more ideas how to explain this .
Also strange is the jump-up in execution time for List.Sort at length of 4000 items. No idea if List.Sort here switches to another (worse) implementation?
Regarding the question, it appears to pay of to copy the items, sort in a plain array and copy them back if needed. Depending on your hardware, the break even may shift up or down. So as always: profile your implementation!