How does sorting the smaller (sub)array make quicksort faster?

Question

I've heard that in quicksort it is better to call recursion on the smaller subarray first. For example if 5 was the pivot and the data gets sorted to 4,1,3,5,7,6 then it's better to sort the subarray 7,6 first because it contains two elements where as 4,1,3 contains three.

The source of all knowledge gives the pseudocode for quicksort

quicksort(A, i, k):
  if i < k:
    p := partition(A, i, k)
    quicksort(A, i, p - 1)
    quicksort(A, p + 1, k)

So an algorithm that would implement away to recurse on the smaller array first would look something like

quicksort(A, i, k):
  if i < k:
    p := partition(A, i, k)
    if(p-i > k-p)
    /*difference from start point to pivot is greater than
   difference from pivot to end point*/
        quicksort(A, p + 1, k)
        quicksort(A, i, p - 1)
   else
        quicksort(A, i, p - 1)
        quicksort(A, p + 1, k)

I profiled code like this written in Java and it does appear to be faster but why? At first I thought it had to do with tail recursion optimization but then realized that was definilty wrong as Java doesn't support it.

Why is sorting the code in the smaller subarray first faster? This article claims it should be

That article doesn't say anything about it being faster that I can see. It just says that it will minimize stack depth if you use tail recursion. — interjay, Aug 25 '14 at 09:01
The linked article talks about stack usage. Did I miss the discussion on speed there? If yes, please add a quote from the article to make your question clearer. — anatolyg, Aug 25 '14 at 09:03
Did you look up quicksort on Wikipedia? It analyses the runtime of Quicksort quite well (hint it tries to reduce the number of comparisons needed). — invalid_id, Aug 25 '14 at 09:23
Care to show your benchmark you used? Also, did you [statistically prove](http://en.wikipedia.org/wiki/Statistical_proof) the claim that order of recursion does matter? — amit, Aug 25 '14 at 09:49
@MooingDuck maybe that's what I was thinking, if you could loop one why couldn't you loop the other? — Celeritas, Aug 25 '14 at 22:57
@Celeritas: No wait, I got it backwards. You recurse the smaller array, then loop the larger. That guarantees the stack depth is _at most_ 64 calls deep (assuming 64bit code). Recursing on the larger would guarantee that the stack depth was at _least_ 64 calls deep, with no effective maximum. — Mooing Duck, Aug 25 '14 at 23:08
@MooingDuck can you explain your comment above? It seems to chime with what Skiena says in his [book](http://www.algorist.com/) _"Assuming that your compiler is smart enough to remove tail recursion, you can minimize run-time memory by processing the smaller partition before the larger one. Since successive stored calls are at most half as large as the previous one, only O(lg n) stack space is needed._, but I'm not able to understand the reasoning behind it. Since the second recursive call is _always_ made, how does it matter whether it was called first or last? And where is 64 coming from? — Abhijit Sarkar, Jan 28 '19 at 02:39
Possible duplicate of [Quicksort - which sub-part should be sorted first?](https://stackoverflow.com/questions/12792738/quicksort-which-sub-part-should-be-sorted-first) — Abhijit Sarkar, Jan 28 '19 at 03:01
@AbhijitSarkar: On a 64 bit architecture, there can be at most, 2^64 items in the array before you run out of virtual memory. Each partition will have a bigger side and a smaller side. The bigger side will always have more than half the items, and the smaller side will always have less than half. By recursing on the smaller side, each step is therefore always less than half of the prior data, taking at most log2(2^64)=64 calls to reach an empty partition. — Mooing Duck, Jan 28 '19 at 18:28
@MooingDuck The max size of an array on the JVM is roughly the max int, which is way smaller than 2^64. I don't know about the other languages, but I think the magic number 64 is not really relevant here. What matters is that the last call doesn't use the stack, and is converted to a goto. The link I referred to above has a good answer based on tail recursion. — Abhijit Sarkar, Jan 28 '19 at 22:33
@AbhijitSarkar I was referring to the _hardware_. On a processor with 64 bit memory addresses, each process is limited to 2^64 bits of virtual memory. Due to this limit, you know that the quicksort of items in memory will be limited to a depth of 64 calls regardless of language. A system with 8 bit pointers would take no more than 8 recursive calls. — Mooing Duck, Jan 28 '19 at 22:50
OK, although that'd be a limitation for _any_ recursive algo, not just QuickSort. Also, perhaps you mean "2^64 _bytes_ of virtual memory", not bits, since the smallest addressable unit on any modern architecture is a byte. — Abhijit Sarkar, Jan 29 '19 at 00:36

score 1 · Answer 1 · answered Aug 25 '14 at 16:43

Maybe I'm missing something subtle here, but if you recurse on your two cases in a different order then you're just traversing the recursion tree depth-first after potentially performing some swaps of children sub-trees at each node but it's still isomorphic to the original recursion tree and the set of all recursion depths for base cases will still be the same. The only way I can see getting demonstrable recursion depth reduction or other kind of recursion-related speed-up is doing something like you recurse on the smaller subarray and then pick a pivot for the larger subarray (without recursing) and then recurse on the two subcases for the larger subarray. This will turn your recursion tree into a ternary recursion tree instead of binary, that should typically have lower maximum depth.

This does not answer the question. Improvement in performance does not mean the recursion tree is "smaller". For example, if recursing smaller array first has better cache performance, you will get the same recursion tree, but better performance. (Just hypothising here, not claiming anything about cache) — amit, Aug 25 '14 at 22:15

score 0 · Answer 2 · answered Aug 25 '14 at 10:05

0

Java JIT compiler can eliminate tail recursion which leads to reduced stack memory use. Reduced stack memory may reduces cache pressure allowing larger partition size to fit to faster cache level.

Other minor improvement is reduced number of function calls. Those there is minor reduction to number of instruction executed. But the instruction count reduction has usually very low correlation to performance when operating on data that doesn't fit to cache.

answered Aug 25 '14 at 10:05

Pauli Nieminen

1,100
8
7

Isn't that wrong, Java infact doesn't "eliminate tail recursion"? http://stackoverflow.com/questions/4401052/does-java-support-tail-recursion – Celeritas Aug 25 '14 at 16:38
Byte code or JVM doesn't eliminate them but JIT compilers do if there isn't any blocking requirements like maintaining security stuff in stack. Example IBM documents it http://publib.boulder.ibm.com/infocenter/javasdk/v1r4m2/index.jsp?topic=%2Fcom.ibm.java.doc.diagnostics.142j9%2Fhtml%2Fhowjitopt.html – Pauli Nieminen Aug 27 '14 at 10:42

score 0 · Answer 3 · answered Aug 25 '14 at 22:07

I watched a Leslie Lamport talk recently in which he pointed out that Quicksort, as originally presented, is not necessarily a recursive algorithm, even though many programmers choose to implement it recursively.

Instead of recursing into the partitions, you can simply add them to a queue of ranges that still must be sorted. The algorithm iterates until the queue is empty. Whether you push and pull the range from the head or the tail determines whether you progress depth-first or breadth-first.

quicksort(a, begin, end) {
  queue<range> q;
  push q(range(begin, end));
  while (!q.empty()) {
    range = q.pop_front();
    pivot = partition(a, range.begin, range.end);
    if (pivot > range.begin) q.push_back(range(range.begin, mid));
    if (pivot + 1 < range.end) q.push_back(range(mid + 1, range.end));
  }
}

A recursive implementation just uses the stack as a LIFO queue, and therefore naturally proceeds in a depth-first manner, and I don't really think it matters which side you process next. The queue length is still bounded by the recursive depth.

But if you're working in a breadth first-manner, by using a FIFO queue as shown in the pseudo-code, then the order can matter. The larger partition will add more subranges to the queue, so you'd rather do that side once the queue is already as small as possible. To make the queue smaller, you want to process the smaller subrange first.

How does sorting the smaller (sub)array make quicksort faster?

3 Answers3