Has anyone seen this improvement to quicksort before?

Question

Handling repeated elements in previous quicksorts

I have found a way to handle repeated elements more efficiently in quicksort and would like to know if anyone has seen this done before.

This method greatly reduces the overhead involved in checking for repeated elements which improves performance both with and without repeated elements. Typically, repeated elements are handled in a few different ways which I will first enumerate.

First, there is the Dutch National Flag method which sort the array like [ < pivot | == pivot | unsorted | > pivot].

Second, there is the method of putting the equal elements to the far left during the sort and then moving them to the center the sort is [ == pivot | < pivot | unsorted | > pivot] and then after the sort the == elements are moved to the center.

Third, the Bentley-McIlroy partitioning puts the == elements to both sides so the sort is [ == pivot | < pivot | unsorted | > pivot | == pivot] and then the == elements are moved to the middle.

The last two methods are done in an attempt to reduce the overhead.

My Method

Now, let me explain how my method improves the quicksort by reducing the number of comparisons. I use two quicksort functions together rather than just one.

The first function I will call q1 and it sorts an array as [ < pivot | unsorted | >= pivot].

The second function I will call q2 and it sorts the array as [ <= pivot | unsorted | > pivot].

Let's now look at the usage of these in tandem in order to improve the handling of repeated elements.

First of all, we call q1 to sort the whole array. It picks a pivot which we will further reference to as pivot1 and then sorts around pivot1. Thus, our array is sorted to this point as [ < pivot1 | >= pivot1 ].

Then, for the [ < pivot1] partition, we send it to q1 again, and that part is fairly normal so let's sort the other partition first.

For the [ >= pivot1] partition, we send it to q2. q2 choses a pivot, which we will reference to as pivot2 from within this sub-array and sorts it into [ <= pivot2 | > pivot2].

If we look now at the entire array, our sorting looks like [ < pivot1 | >= pivot1 and <= pivot2 | > pivot2]. This looks very much like a dual-pivot quicksort.

Now, let's return to the subarray inside of q2 ([ <= pivot2 | > pivot2]).

For the [ > pivot2] partition, we just send it back to q1 which is not very interesting.

For the [ <= pivot2] partition, we first check if pivot1 == pivot2. If they are equal, then this partition is already sorted because they are all equal elements! If the pivots aren't equal, then we just send this partition to q2 again which picks a pivot (further pivot3), sorts, and if pivot3 == pivot1, then it does not have to sort the [ <= pivot 3] and so on.

Hopefully, you get the point by now. The improvement with this technique is that equal elements are handled without having to check if each element is also equal to the pivots. In other words, it uses less comparisons.

There is one other possible improvement that I have not tried yet which is to check in qs2 if the size of the [ <= pivot2] partition is rather large (or the [> pivot2] partition is very small) in comparison to the size of its total subarray and then to do a more standard check for repeated elements in that case (one of the methods listed above).

Source Code

Here are two very simplified qs1 and qs2 functions. They use the Sedgewick converging pointers method of sorting. They can obviously can be very optimized (they choose pivots extremely poorly for instance), but this is just to show the idea. My own implementation is longer, faster and much harder to read so let's start with this:

// qs sorts into [ < p | >= p ]
void qs1(int a[], long left, long right){
    // Pick a pivot and set up some indicies
    int pivot = a[right], temp;
    long i = left - 1, j = right;
    // do the sort
    for(;;){
        while(a[++i] < pivot);
        while(a[--j] >= pivot) if(i == j) break;
        if(i >= j) break;
        temp = a[i];
        a[i] = a[j];
        a[j] = temp;
    }
    // Put the pivot in the correct spot
    temp = a[i];
    a[i] = a[right];
    a[right] = temp;

    // send the [ < p ] partition to qs1
    if(left < i - 1)
        qs1(a, left, i - 1);
    // send the [ >= p] partition to qs2
    if( right > i + 1)
        qs2(a, i + 1, right);
}

void qs2(int a[], long left, long right){
    // Pick a pivot and set up some indicies
    int pivot = a[left], temp;
    long i = left, j = right + 1;
    // do the sort
    for(;;){
        while(a[--j] > pivot);
        while(a[++i] <= pivot) if(i == j) break;
        if(i >= j) break;
        temp = a[i];
        a[i] = a[j];
        a[j] = temp;
    }
    // Put the pivot in the correct spot
    temp = a[j];
    a[j] = a[left];
    a[left] = temp;

    // Send the [ > p ] partition to qs1
    if( right > j + 1)
        qs1(a, j + 1, right);
    // Here is where we check the pivots.
    // a[left-1] is the other pivot we need to compare with.
    // This handles the repeated elements.
    if(pivot != a[left-1])
            // since the pivots don't match, we pass [ <= p ] on to qs2
        if(left < j - 1)
            qs2(a, left, j - 1);
}

I know that this is a rather simple idea, but it gives a pretty significant improvement in runtime when I add in the standard quicksort improvements (median-of-3 pivot choosing, and insertion sort for small array for a start). If you are going to test using this code, only do it on random data because of the poor pivot choosing (or improve the pivot choice). To use this sort you would call:

qs1(array,0,indexofendofarray);

Some Benchmarks

If you want to know just how fast it is, here is a little bit of data for starters. This uses my optimized version, not the one given above. However, the one given above is still much closer in time to the dual-pivot quicksort than the std::sort time.

On highly random data with 2,000,000 elements, I get these times (from sorting several consecutive datasets):

std::sort - 1.609 seconds  
dual-pivot quicksort - 1.25 seconds  
qs1/qs2 - 1.172 seconds

Where std::sort is the C++ Standard Library sort, the dual-pivot quicksort is one that came out several months ago by Vladimir Yaroslavskiy, and qs1/qs2 is my quicksort implementation.

On much less random data. with 2,000,000 elements and generated with rand() % 1000 (which means that each element has roughly 2000 copies) the times are:

std::sort - 0.468 seconds  
dual-pivot quicksort - 0.438 seconds  
qs1/qs2 - 0.407 seconds

There are some instances where the dual-pivot quicksort wins out and I do realize that the dual-pivot quicksort could be optimized more, but the same could be safely stated for my quicksort.

Has anyone seen this before?

I know this is a long question/explanation, but have any of you seen this improvement before? If so, then why isn't it being used?

What you want to do is n academic literature search. R Samuel Klatchko gave you links to the best journals in your previous question on this topic, and theMick told you how t get started if you don't know what you're doing. — dmckee --- ex-moderator kitten, Jan 20 '10 at 23:27
Is my formatting a little better now? Do you have any suggestions? Is there a better site to post this on? — Justin Peel, Jan 20 '10 at 23:47
Whatever you do, sorting will take O(NlogN) comparisions in the average case as per the decision tree model proof given in Allen Weiss book(Data structures and Algorithms in C++). — Boolean, Jan 21 '10 at 00:03
Yes, this is still an average O(N log N) sort. The coefficient is just smaller than others that handle repeated elements. — Justin Peel, Jan 21 '10 at 00:32
@Raccha: Not familiar with that book, but I think the O(Nlog N) lower bound assumes that only 1 comparison function is available. Of course I'd be very surprised if having both `<=` and `>=` magically made asymptotically faster algorithms possible! — j_random_hacker, Jul 12 '12 at 16:26

score 7 · Answer 1 · answered May 09 '12 at 00:13

7

Vladimir Yaroslavskiy | 11 Sep 12:35 Replacement of Quicksort in java.util.Arrays with new Dual-Pivot Quicksort

Visit http://permalink.gmane.org/gmane.comp.java.openjdk.core-libs.devel/2628

answered May 09 '12 at 00:13

jun

71
1
1

1

This is a broken link now. It looks like http://mail.openjdk.java.net/pipermail/core-libs-dev/2009-September/002630.html may be a working link to the same thing, but it's not clear whether it does describe the same algorithm as the OP's. – kaya3 May 08 '21 at 20:24

score 5 · Answer 2 · edited May 23 '17 at 11:45

To answer your question, no I have not seen this approach before. I'm not going to profile your code and do the other hard work, but perhaps the following are next steps/considerations in formally presenting your algorithm. In the real world, sorting algorithms are implemented to have:

Good scalability / complexity and Low overhead

Scaling and overhead are obvious and are easy to measure. When profiling sorting, in addition to time measure number of comparisons and swaps. Performance on large files will also be dependent on disk seek time. For example, merge sort works well on large files with a magnetic disk. ( see also Quick Sort Vs Merge Sort )

Wide range of inputs with good performance

There's lots of data that needs sorting. And applications are known to produce data in patterns, so it is important to make the sort is resilient against poor performance under certain patterns. Your algorithm optimizes for repeated numbers. What if all numbers are repeated but only once (i.e. seq 1000>file; seq 1000>>file; shuf file)? What if numbers are already sorted? sorted backwards? what about a pattern of 1,2,3,1,2,3,1,2,3,1,2,3? 1,2,3,4,5,6,7,6,5,4,3,2,1? 7,6,5,4,3,2,1,2,3,4,5,6,7? Poor performance in one of these common scenarios is a deal breaker! Before comparing against a published general-purpose algorithm it is wise to have this analysis prepared.

Low-risk of pathological performance

Of all the permutations of inputs, there is one that performs worse than the others. How much worse does that perform than average? And how many permutations will provide similar poor performance?

Good luck on your next steps!

Yes, although a sorting algorithm might also end up in the "use-case-specific box", taking advantage of the properties of specific datasets to achieve better performance, at the cost of being worse in general. Thus, an algorithm that has pathological performance on some inputs might still be valuable, just not as a general algorithm. — spectras, May 26 '20 at 11:23

score 0 · Answer 3 · answered Jan 20 '10 at 23:13

0

It's a great improvment and I'm sure it's been implemented specifically if you expect a lot of equal objects. There are many of the wall tweeks of this kind.

If I understand all you wrote correctly, the reason it's not generally "known" is that it does improve the basic O(n2) performance. That means, double the number of objects, quadruple the time. Your improvement doesn't change this unless all objects are equal.

answered Jan 20 '10 at 23:13

Martin

2,956
7
30
59

3

I think you mean "it doesn't improved the basic O(n2) performance" – David Oneill Jan 20 '10 at 23:17
I think that you're missing the point. To have a decent quicksort, you need to be able to handle repeated elements. The current ways of doing this add much more to the quicksort than my method. The O(N^2) worst case performance comes from repeated elements and/or choosing bad pivots. This improvement tackles the repeated elements part and a median-of-3 pivot choosing method or picking random pivots can help with choosing better pivots. – Justin Peel Jan 20 '10 at 23:31
7

I think you meant 'I think you mean "it doesn't improve the basic O(n2) performance"' – littlegreen Jan 20 '10 at 23:31
2

n^2 is just worst case, without much practical consequences. As I have to run it on a real machine, where c1*O(n^2) = c2*O(n log n), I want to know the constants! – Stephan Eggermont Jan 20 '10 at 23:56
yes I agree, O(n2) is of little practical value, but that's my theory on why you don't find this type of improvement published. Well, actually, it might be because there are other sorting methods which might be more interesting to improve on. I do like the way you're handling repeated elements. – Martin Jan 21 '10 at 01:08
By the way, with a randomly chosen pivots, this method makes worst case behavior extremely rare. It basically eradicates it if I check explicitly for repeated elements when the [ > pivot] partition of qs2 has a size < 2. – Justin Peel Jan 22 '10 at 15:06

score -1 · Answer 4 · answered Jan 21 '10 at 01:24

-1

std:sort is not exactly fast.

Here are results I get comparing it to randomized parallel nonrecursive quicksort:

pnrqSort (longs): .:.1 000 000 36ms (items per ms: 27777.8)

.:.5 000 000 140ms (items per ms: 35714.3)

.:.10 000 000 296ms (items per ms: 33783.8)

.:.50 000 000 1s 484ms (items per ms: 33692.7)

.:.100 000 000 2s 936ms (items per ms: 34059.9)

.:.250 000 000 8s 300ms (items per ms: 30120.5)

.:.400 000 000 12s 611ms (items per ms: 31718.3)

.:.500 000 000 16s 428ms (items per ms: 30435.8)

std::sort(longs) .:.1 000 000 134ms (items per ms: 7462.69)

.:.5 000 000 716ms (items per ms: 6983.24)

std::sort vector of longs

1 000 000 511ms (items per ms: 1956.95)

2 500 000 943ms (items per ms: 2651.11)

Since you have extra method it is going to cause more stack use which will ultimately slow things down. Why median of 3 is used, I don't know, because it's a poor method, but with random pivot points quicksort never has big issues with uniform or presorted data and there's no danger of intentional median of 3 killer data.

answered Jan 21 '10 at 01:24

Charles Eli Cheese

777
3
7

Yes, I've thought about using other pivot choosing methods including randomized pivots. That isn't the point. Also, notice that yours is both nonrecursive and parallel. Of course it will be faster! I used recursion because it is much simpler to implement and easier for people to quickly understand. My method can be made both nonrecursive and parallel as well. Yes, std::sort is not the fastest, but it provides a common function for comparison. The dual-pivot quicksort, however, is quite fast for being recursive and serial. – Justin Peel Jan 21 '10 at 01:34
1

So what is the point exactly? Apparently none, to downvote my response for no reason. Using two pivot methods will be twice the stack overhead, as I pointed out, and also as I pointed out it doesn't gain you anything over existing methods, so what's the point? Apparently none, just as with the question itself. – Charles Eli Cheese Jan 21 '10 at 04:35
7

I downvoted your response because you were comparing apples to oranges. Comparing a nonrecursive, parallel quicksort with a recursive, serial quicksort is meaningless. You didn't even specify how many processors were being used. Using two different pivot methods does not double the number of stack calls - it is the same number of calls as using your basic quicksort. If you add the total number of stack calls to each function (qs1 and qs2) and to a basic quicksort you should get the same number. Also, maybe you missed that this can be used to improve nonrecursive and parallel methods. – Justin Peel Jan 21 '10 at 14:48
Actually, the number of calls to qs1 and qs2 combined will be less than the stack calls in a basic quicksort when there is repeated data. – Justin Peel Jan 21 '10 at 16:15

Warren D Smith · Answer 5 · 2014-05-16T00:02:56.277

nobody seems to like your algorithm, but I do. Seems to me it's a nice way to re-do classic quicksort in a manner now safe for use with highly repeated elements. Your q1 and q2 subalgorithms, it seems to me are actually the SAME algorithm except that < and <= operators interchanged and a few other things, which if you wanted would allow you to write shorter pseudocode for this (though might be less efficient). Recommend you read JL Bentley, MD McIlroy: Engineering a Sort Function
SOFTWARE—PRACTICE AND EXPERIENCE 23,11 (Nov 1993)1249-1265 e-available here http://www.skidmore.edu/~meckmann/2009Spring/cs206/papers/spe862jb.pdf to see the tests they put their quicksort through. Your idea might be nicer and/or better, but it needs to run the gauntlet of the kinds of tests they tried, using some particular pivot-choosing method. Find one that passes all their tests without ever suffering quadratic runtime. Then if in addition your algorithm is both faster and nicer than theirs, you would then clearly have a worthwhile contribution.

The "Tukey Ninther" thing they use to generate a pivot seems to me is usable by you too and will automatically make it very hard for the quadratic time worst case to arise in practice. I mean, if you just use median-of-3 and try the middle and two end elements of the array as your three, then an adversary will make the initial array state be increasing then decreasing and then you'll fall on your face with quadratic runtime on a not-too-implausible input. But with Tukey Ninther on 9 elements, it's pretty hard for me to construct a plausible input which hurts you with quadratic runtime.

Another view & a suggestion: Think of the combination of q1 splitting your array, then q2 splitting the right subarray, as a single q12 algorithm producing a 3-way split of the array. Now, you need to recurse on the 3 subarrays (or only 2 if the two pivots happen to be equal). Now always recurse on the SMALLEST of the subarrays you were going to recurse on, FIRST, and the largest LAST -- and do not implement this largest one as a recursion, but rather just stay in the same routine and loop back up to the top with a shrunk window. That way you have 1 fewer recursive call in q12 than you would have, but the main point of this is, it is now IMPOSSIBLE for the recursion stack to ever get more than O(logN) long. OK? This solves another annoying worst-case problem quicksort can suffer while also making your code a bit faster anyhow.