Python QuickSort maximum recursion depth

Question

(Python 2.7.8 Windows)

I'm doing a comparison between different sorting algorithms (Quick, bubble and insertion), and mostly it's going as expected, Quick sort is considerably faster with long lists and bubble and insertion are faster with very short lists and alredy sorted ones.

What's raising a problem is Quick Sort and the before mentioned "already sorted" lists. I can sort lists of even 100000 items without problems with this, but with lists of integers from 0...n the limit seems to be considerably lower. 0...500 works but even 0...1000 gives:

RuntimeError: maximum recursion depth exceeded in cmp

Quick Sort:

def quickSort(myList):
    if myList == []:
        return []
    else:
        pivot = myList[0]
        lesser = quickSort([x for x in myList[1:] if x < pivot])
        greater = quickSort([x for x in myList[1:] if x >= pivot])
        myList = lesser + [pivot] + greater
        return myList

Is there something wrong with the code, or am I missing something?

abarnert · Accepted Answer · 2014-11-25T00:14:38.643

There are two things going on.

First, Python intentionally limits recursion to a fixed depth. Unlike, say, Scheme, which will keep allocating frames for recursive calls until you run out of memory, Python (at least the most popular implementation, CPython) will only allocate sys.getrecursionlimit() frames (defaulting to 1000) before failing. There are reasons for that,* but really, that isn't relevant here; just the fact that it does this is what you need to know about.

Second, as you may already know, while QuickSort is O(N log N) with most lists, it has a worst case of O(N^2)—in particular (using the standard pivot rules) with already-sorted lists. And when this happens, your stack depth can end up being O(N). So, if you have 1000 elements, arranged in worst-case order, and you're already one frame into the stack, you're going to overflow.

You can work around this in a few ways:

Rewrite the code to be iterative, with an explicit stack, so you're only limited by heap memory instead of stack depth.
Make sure to always recurse into the shorter side first, rather than the left side. This means that even in the O(N^2) case, your stack depth is still O(log N). But only if you've already done the previous step.**
Use a random, median-of-three, or other pivot rule that makes common cases not like already-sorted worst-case. (Of course someone can still intentionally DoS your code; there's really no way to avoid that with quicksort.) The Wikipedia article has some discussion on this, and links to the classic Sedgewick and Knuth papers.
Use a Python implementation with an unlimited stack.***
sys.setrecursionlimit(max(sys.getrecursionlimit(), len(myList)+CONSTANT)). This way, you'll fail right off the bat for an obvious reason if you can't make enough space, and usually won't fail otherwise. (But you might—you could be starting the sort already 900 steps deep in the stack…) But this is a bad idea.****. Besides, you have to figure out the right CONSTANT, which is impossible in general.*****

_{* Historically, the CPython interpreter recursively calls itself for recursive Python function calls. And the C stack is fixed in size; if you overrun the end, you could segfault, stomp all over heap memory, or all kinds of other problems. This could be changed—in fact, Stackless Python started off as basically just CPython with this change. But the core devs have intentionally chosen not to do so, in part because they don't want to encourage people to write deeply recursive code.}

_{** Or if your language does automatic tail call elimination, but Python doesn't do that. But, as gnibbler points out, you can write a hybrid solution—recurse on the small end, then manually unwrap the tail recursion on the large end—that won't require an explicit stack.}

_{*** Stackless and PyPy can both be configured this way.}

_{**** For one thing, eventually you're going to crash the C stack.}

_{***** The constant isn't really constant; it depends on how deep you already are in the stack (computable non-portably by walking sys._getframe() up to the top) and how much slack you need for comparison functions, etc. (not computable at all, you just have to guess).}

the `sys.setrecursionlimit()` trick assumes you're the only routine using the stack... there's no room left for whoever might be calling you. :-) — kindall, Nov 24 '14 at 23:58
@kindall: Yeah, and there's no portable way to find out how deep in the stack you are, and no way at all to find out how much extra depth you need. I've used a constant of 1003 as a hack to work around someone else's code and get the server back up for a week or two until I could replace it, but otherwise I wouldn't recommend it… — abarnert, Nov 25 '14 at 00:09
Doing recursive call on the shorter side is a good idea. The longer side can be handled iteratively (`while` instead of `if`) with some refactoring and no need for explicit stack. That should make the worstcase stack O(log N) — John La Rooy, Nov 25 '14 at 00:09
@gnibbler: Good point. If you're only de-recursifying the tail-recursive part, that can be done without the stack. I've edited that into the footnote; you think it needs to go higher up? — abarnert, Nov 25 '14 at 00:11

score 4 · Answer 2 · answered Nov 24 '14 at 23:52

You're choosing the first item of each sublist as the pivot. If the list is already in order, this means that your greater sublist is all the items but the first, rather than about half of them. Essentially, each recursive call manages to process only one item. Which means the depth of recursive calls you'll need to make will be about the same as the number of items in the full list. Which overflows Python's built-in limit once you hit about 1000 items. You will have a similar problem sorting lists that are already in reversed order.

To correct this use one of the workarounds suggested in the literature, such as choosing an item at random to be the pivot or the median of the first, middle, and last items.

Since I cannot upvote, and can only choose one answer, I shall give the old-fashioned thanks: Thank you. — questions989, Nov 25 '14 at 00:03

score 2 · Answer 3 · answered Nov 24 '14 at 23:48

2

Always choosing the first (or last) element as the pivot will have problems for quicksort - worst case performance for some common inputs as you have seen

One technique that works fairly well is to choose the average of first,middle and last element

You don't want to make the pivot selection too complicated, or it will dominate the runtime of the search

answered Nov 24 '14 at 23:48

John La Rooy

295,403
53
369
502

Just a note: choosing a pivot that isn't necessarily in the list (such as you might get by using an average) will require changes to the algorithm, since the code explicitly includes the pivot value. – kindall Nov 24 '14 at 23:53
Since I cannot upvote, and can only choose one answer, I shall give the old-fashioned thanks: Thank you. – questions989 Nov 25 '14 at 00:04

Python QuickSort maximum recursion depth

3 Answers3

Linked