Is there anyway to optimize sort on this kind of data?

Question

I am sorting array of integers keys.

Information about the data:

Arrays are 1176 elements long
Keys are between 750 000 and 135 000 000; also 0 is possible
There are a lot of duplicates, in every array there are only between 48 and 100 different keys but it's impossible to predict which values out of whole range those will be
There are a lot of long sorted subsequences, most arrays consists of anywhere between 33 and 80 sorted subsequences
The smallest element is 0; number of 0's is predictable and in very narrow range, about 150 per array

What I tried so far:

stdlib.h qsort;

this is slow, right now my function spends 0.6s on sorting per execution, with stdlib.h qsort it's 1.0s; this has the same performance as std::sort
Timsort;

I tried this: https://github.com/swenson/sort and this: http://code.google.com/p/timsort/source/browse/trunk/timSort.c?spec=svn17&r=17; both were significantly slower than stdlib qsort
http://www.ucw.cz/libucw/ ;

their combination of quick sort and insert sort is the fastest for my data so far; I experimented with various settings and pivot as middle element (not median of 3) and insert sort starting with 28 element sub arrays (not 8 as default) gives the best performance
shell sort;

simple implementation with gaps from this article: http://en.wikipedia.org/wiki/Shellsort; it was decent, although slower than stdlib qsort

My thoughts are that qsort does a lot of swapping around and ruins (ie reverse) sorted subsequences so there should be some way to improve on it by exploiting structure of the data, unfortunately all my tries fail so far.
If you are curious what kind of data is that, those are sets of poker hand evaluated on various boards already sorted on previous board (this is where sorted subsequences come from).

The function is in C. I use Visual Studio 2010. Any ideas ?

Sample data: http://pastebin.com/kKUdnU3N
Sample full execution (1176 sorts): https://dl.dropbox.com/u/86311885/out.zip

Can you post a sample? For questions like this, it helps if we can benchmark directly. — Mahmoud Al-Qudsi, Jun 19 '12 at 02:23
sample posted; >>radix sort For that I would need 135 000 000 elements array and iterating over it :) If I could devise some smart fast hash then maybe... btw I could post sample dataset (1k sorts) if you would like to benchmark more :) — Piotr Lopusiewicz, Jun 19 '12 at 02:29
@PiotrLopusiewicz: Can you use C++? map in C++ STL makes a quick solution to this. Or you can code balanced tree to use as a set. — nhahtdh, Jun 19 '12 at 02:34
@nhahtdh STL Map is not very efficient compared to most problem-specific algorithms, mostly because of its frequent allocations and moves. (Time it and see.) — Crashworks, Jun 19 '12 at 03:15
@Crashworks: You may be right, since the input size is small (1000 range). On larger input that has heavily repeated keys, it should be faster than sorting algorithms. — nhahtdh, Jun 19 '12 at 03:38
@PiotrLopusiewicz It's counting sort that uses 135 000 000 elements array, space complexity of radix sort is O(K.N) where in your case N is 1176 and K is 9. — saeedn, Jun 19 '12 at 04:39
If you describe what you plan to do with the sorted data (what sort of queries you plan to do) there may be a solution to the overall problem which is not bottlenecked by an actual sort of the data. — Ben Jackson, Jun 19 '12 at 07:49

score 7 · Answer 1 · answered Jun 19 '12 at 02:26

7

What if you first do a pass through the array to group the numbers to get rid of duplicates. Each number could go into a hashtable where the number is the key, and the number of times it appears is the value. So if the number 750 000 appears 57 times in the array, the hashtable would hold key=750000; value=57. Then you can sort the much smaller hashtable by keys, which should be less than 100 elements long.

With this you only need to make one pass through the array, and another pass through the much smaller hashtable key list. This should avoid most of the swaps and comparisons.

answered Jun 19 '12 at 02:26

Oleksi

12,947
4
56
80

I thought about it.. I think I need fast monotone hash for that which would compress the range into 1.100 or something and be reasonable distributed, any suggestions ? – Piotr Lopusiewicz Jun 19 '12 at 02:31
@PiotrLopusiewicz You could take the numbers mod some prime for your hash function. For instance try 97. – btilly Jun 19 '12 at 02:32
@PiotrLopusiewicz Module some prime should work well enough. If not, you can can configure this easily enough. – Oleksi Jun 19 '12 at 02:33
I will try this, sounds good; it's not monotone hash though so if it doesn't distribute perfectly I don't gain anything ? Or am I missing something here ? – Piotr Lopusiewicz Jun 19 '12 at 03:05
1

A monotonous hash isn't needed (or possible). Just build the hash table (a closed hash looks appropriate), then get the keys and sort them normally (just 33-80 keys, no problem), and finally write each key the correct number of times. – ugoren Jun 19 '12 at 06:27
1

I will try to implement that; so far I tried just calculating hashes (% 91) and it already takes half the time of whole sort without even putting elements in place and allocating memory; still if sort of the shorter array takes no time comparing to the big one it will be significant gain; I will let you know :) – Piotr Lopusiewicz Jun 19 '12 at 21:35

score 5 · Answer 2 · edited May 23 '17 at 10:24

5

You can check out this animation, which I saw from this post

I think your problem falls into the "few unique" category, where 3-way partition quick sort and shell sort are very fast.

update:

I implemented some sorting algorithms based on the pseudo codes on sorting-algorithms.com and run them on the sample data given by OP. Just for fun:

insertion 0.154s

shell 0.031s

quick sort 0.018s

radix 0.017s

3-way quick sort 0.013s

edited May 23 '17 at 10:24

Community

1
1

answered Jun 19 '12 at 02:32

xvatar

3,229
17
20

Any chance you could make your implementation of 3-way quick sort available ? The algorithm which is so far the fastest for me is quicksort + insert sort (I linked to it in OP) but maybe 3-way quick sort will be even better becaue of large amount of duplicates. – Piotr Lopusiewicz Jun 21 '12 at 14:21
@PiotrLopusiewicz sorry I just clean up all these files 1 hour ago.. however what I implemented was strictly consistent with the pseudo code [here](http://www.sorting-algorithms.com/quick-sort-3-way). It should be fairly easy for you. Good luck! – xvatar Jun 21 '12 at 16:00
I was able to beat stdlib qsort with implementation of 3 way quicksort but I am far away from beating ucw guys implementation. I think I am done trying, thanks for suggestions all :) – Piotr Lopusiewicz Jun 22 '12 at 01:39
@PiotrLopusiewicz Nice try! Glad to help :) – xvatar Jun 22 '12 at 02:31

Justin · Answer 3 · 2012-06-20T20:37:09.587

Seems like a Radix Sort or a Bucket sort would be the way to go since they can be efficient on integers.

Radix sort's efficiency is O(k·n) for n keys which have k or fewer digits. Sometimes k is presented as a constant, which would make radix sort better (for sufficiently large n) than the best comparison-based sorting algorithms, which are all O(n·log(n)). While bucket sort is O(N*k) for n keys and k buckets.

It may come down to the constant (K) factor for radix sort. From my Java experimentation. Also, it's worth noting that radix doesn't fair so well with sorted elements.

100k integers:

Algorithm           Random  Sorted  Reverse Sorted
Merge sort          0.075   0.025   0.025
Quicksort           0.027   0.014   0.015
Heap sort           0.056   0.03    0.03
Counting sort       0.022   0.002   0.004
Radix sort          0.047   0.018   0.016

500k integers:

Algorithm           Random  Sorted  Reverse Sorted
Merge sort          0.286   0.099   0.084
Quicksort           0.151   0.051   0.057
Heap sort           0.277   0.134   0.098
Counting sort       0.046   0.012   0.01
Radix sort          0.152   0.088   0.079

1M integers:

Algorithm           Random  Sorted  Reverse Sorted
Merge sort          0.623   0.18    0.165
Quicksort           0.272   0.085   0.084
Heap sort           0.662   0.286   0.207
Counting sort       0.066   0.022   0.016
Radix sort          0.241   0.2     0.164

10M integers:

Algorithm           Random  Sorted  Reverse Sorted
Merge sort          7.086   2.133   1.946
Quicksort           4.148   0.88    0.895
Heap sort           11.396  3.283   2.503
Counting sort       0.638   0.181   0.129
Radix sort          2.856   2.909   3.901

It seems like 500k items is when the constant starts favoring radix sort over quicksort.

W/e I tried with radix it's always too slow. I guess allocating memory for the buckets slow the thing down I wasn't able to achieve even comparable speed with radix, it's several times slower for me. — Piotr Lopusiewicz, Jun 19 '12 at 21:36
Did you implement your own or find one online? There a number of optimizations that can be done. If you made your own, try a well defined version. http://opensource.apple.com/source/Libc/Libc-167/stdlib.subproj/radixsort.c — Justin, Jun 20 '12 at 19:10
Also, radix sort can be done in parallel pretty well; if that's an option for you. — Justin, Jun 20 '12 at 19:12
@PiotrLopusiewicz I've updated my answer to reflect my experimentations. — Justin, Jun 20 '12 at 20:18
@PiotrLopusiewicz I've also found a radix sort which only requires O(1) extra storage. It's called American Flag sort. Maybe that'll help. — Justin, Jun 21 '12 at 17:36

Paul · Answer 4 · 2012-06-19T05:07:13.350

There is an algorithm that takes advantage of sorted sub-sequences. It is a variant of Merge Sort called Natural Merge Sort. I can't find a good example of an implementation in C, but it doesn't look too hard to implement from scratch. Basically it goes something like this:

You need a struct containing two ints, the index and length of a sub-sequence. Create a new array (or probably a linked list) of these structs.
Iterate through your entire array once and every time a value is smaller than the previous value it is the start of a new sub-sequence so create a new struct and assign the position of the sub-sequence, and assign the length of the previous sub-sequence to the previous struct.
Iterate through your structs and perform the merge operation on them in pairs.
Repeat step 3 until all are merged.

The merge operation is the same as the merge operation in Merge Sort. You have a pointer to the start of each sub-sequence. Whichever is smaller should be at the start of the sub-sequence, so move it there if it isn't already and advance the pointer on the sub-sequence you moved it from. Continue merging the two sub-sequences until they are fully sorted.

You may be able to combine this with Oleski's answer to create a sort of linked list where each node contains a value and the number of times a value occurs in a row within a subsequence. Then when you are merging, if you encounter equivalent values you add their cardinalities together to merge several identical values at once with a single addition. You would not need to make a hash for this potential optimization.

I will likely write this in `C` some time during the next couple days so you can benchmark it. — Paul, Jun 19 '12 at 04:59
That sounds good intuitively I will try that. You can try benchmarking on my data if your are to implement it :) — Piotr Lopusiewicz, Jun 19 '12 at 21:37
I implemented it and it's not bad. My implementation is very naive for now, I think this will be a bit faster than what I have now. — Piotr Lopusiewicz, Jun 21 '12 at 14:20
Hey @PiotrLopusiewicz sorry I haven't gotten around to this yet. If you've already done it I suppose there is no point in implementing my own version. Let me know how it works out, and if you're able to make any optimizations to it :) — Paul, Jun 21 '12 at 17:49

score 1 · Answer 5 · answered Jun 19 '12 at 02:33

Build a hash table and allocate an array. For each item in the input array, check to see whether that item is in the hash table. If yes, then increment its value. If not, insert it into the hash table with value 1 and append it to your array.

Sort the array. For each item in the array, write that item into the output a number of times equal to its count in the hash table. Fin.

EDIT: You can clear and re-use the hash table for each array you need to sort.

score 0 · Answer 6 · answered Jun 19 '12 at 02:31

I would try a hand-coded qsort with the special trick that at each node you store the number, and the count of times it occurs. When you see it again, you increment the count.

Always take the pivot from the middle of the array so that sorted subsequences don't give you a series of bad pivots.

score 0 · Answer 7 · answered Jun 19 '12 at 05:26

Given the sorted runs, one reasonable possibility would be to use an in-place merge to put those runs together into large sorted runs until your entire array is sorted. Note that if the function just needs a C interface (rather than having to be written in C itself), you could use the std::inplace_merge from the C++ standard library, but write your function with extern "C" linkage specification, so you can use it from C.

score 0 · Answer 8 · answered Jan 12 '20 at 02:08

GNU's qsort is frankly very good and dificult to beat, but I've recently converted most my qsort calls to calls to tim_sort by Christopher Swenson, which can be found at https://github.com/swenson/sort/blob/master/sort.h - it's really extremely good. Btw, it explicitly exploits already sorted segmens, like the ones in your data.

I'm doing C++, and have "templatized" Christopher's (pure C macros) code, but it probably doesn't change the fact that it's hands down an absolute winner on your data. The following, in GNU-64 with -O3, is without appeal I believe:

Sort test:

qsort: 00:00:25.4
tim_sort: 00:00:08.2
std::sort: 00:00:15.2

(that's running each of the sorts a million times).

Is there anyway to optimize sort on this kind of data?

Information about the data:

What I tried so far:

8 Answers8