1

When I send to this method an array of size > 900000, I get a StackOverflowError about half of the time. Please, help me understand why this is happening and how to fix it.

public static void quicksort2(int[] xs, int lo, int hi){
    int h,l,p,t;
    if (lo<hi){
        l = lo;
        h = hi;
        p = xs[hi];

        do {
            while( (l<h) && (xs[l] <= p) )
                l= l+1;

            while( (h > l) && (xs[h] >= p))
                h = h-1;

            if (l < h){
                t = xs[l];
                xs[l] = xs[h];
                xs[h] = t;
            }
        } while (l<h);
        xs[hi] = xs[l];
        xs[l] = p;

        quicksort2(xs, lo, l-1);
        quicksort2(xs, l+1,hi);

    }
}

The specific error I get is:

Exception in thread "main" java.lang.StackOverflowError
at sort.Sorters.quicksort2(Sorters.java:117)
at sort.Sorters.quicksort2(Sorters.java:117)
at sort.Sorters.quicksort2(Sorters.java:118)
at sort.Sorters.quicksort2(Sorters.java:117)
at sort.Sorters.quicksort2(Sorters.java:117)

And then a bunch more of this same thing. The two lines referenced are the recursive calls.

  • What error do you get? Show us the stacktrace. –  Apr 23 '18 at 14:08
  • 6
    The algorithm is recursive, so every recursion level expends some of the stack. Once your data are large enough, the default stack size gets exhausted. – 9000 Apr 23 '18 at 14:09
  • Here's a good question to look at: https://stackoverflow.com/q/19094283/10077 – Fred Larson Apr 23 '18 at 14:12
  • @9000 True, but that doesn't answer the OP entirely. My guess is that a base case is being missed, resulting in infinite calls, or something like this. – Tim Biegeleisen Apr 23 '18 at 14:13
  • 1
    Side note: Java does not have tail call optimzation. Thus, every recursive algorithm may generate a `StackOverflowError`, given that the input is lare enough. – Turing85 Apr 23 '18 at 14:14
  • @TimBiegeleisen: I don't think that infinite recursion just kicks in at "an array of size > 900000"; I think it's a regular stack exhaustion. – 9000 Apr 23 '18 at 14:15
  • 2
    900K, while apparently a large number, is actually tiny from the point of divide and conquer, which runs `O(lgN)` in number of recursive calls. 20 divisions by two take 900K down to 1 :-) – Tim Biegeleisen Apr 23 '18 at 14:16
  • BTW there's a variation of quicksort without recursion or stack at all: https://link.springer.com/chapter/10.1007/BFb0016252 – 9000 Apr 23 '18 at 14:16
  • I don't think it is getting infinite calls since I have never been able to stack overflow with an array of size < 800,000 – Max Dewaele Apr 23 '18 at 14:16
  • How do you call the method, or in what context do you use it ? I tested your code for an array of 1000000 elements like 10 times. No ``StackOverflow`` ... – Schidu Luca Apr 23 '18 at 14:22
  • @TimBiegeleisen: This makes sense when you are able to nicely divide the array by half every time. Quicksort though is known to have quadratic performance characteristics in degenerate cases; I suppose that on certain cases when pivots keep getting selected close to boundaries, you almost do not reduce the size of the task, and the recursion depth grows much faster than logarithmically. This is why a random shuffle is recommended before sorting. – 9000 Apr 23 '18 at 14:23
  • @9000 You might be right, and if the error doesn't happen for, say a 500K array, then this could be the explanation. – Tim Biegeleisen Apr 23 '18 at 14:24

1 Answers1

0

Your implementation of quicksort is fine. It could be a little improved by having better variable names, having curly brackets in the while statements, using l++ and h-- instead of l=l+1 and h=h-1 and some other minor details alike.

The problem is when you do that:

int[] willBlowUp = new int[1000000];
quicksort2(willBlowUp, 0, willBlowUp.size - 1);

This will produce a StackOverflowError because it is a degenerated case that makes the call stack as high as the input. And I'm not aware of any JVM that has a stack dept large enough for a million recursive calls.

The given array consists solely of a million of zeros. So, sorting it in reverse order or shuffing before sorting is hopeless. The first while inside the do loop will always move l until the very end of the partition, leaving the second part for the recursive call with only a solitary element and the first part with everything else. This means that at each recursion level, the unsorted area of the array decreased by only 1 item, when ideally it should decrease by half.

Also, the performance is quite bad due to the first while loop inside the do having to do something like a half trillion iterations in the total (if it was not the StackOverflowError).

This special case could be solved if you use fat quicksort and shuffling, but yet there is always a case, even if statistically improbable that this problem will happen again. Let's suppose that (even after shuffling) you got an array that consists of 1, 2, 3, 4, 5, ..., n. Quicksort will behave badly on such cases.

There are some solutions for this, but they are all complicated and messy, like hybridizing the algorithm, selectively switching to another one on some subsets of the array, analyzing what is the structure of the data prior to choosing the pivot element or reshuffling if the output of a shuffle is bad.