How can I write a Python decorator to increase stackdepth?

Question

BACKGROUND

When playing around, I often write simple recursive functions looking something like:

def f(a,b):
    if a>=0 and b>=0:
        return min( f(a-1,b) , f(b,a-1) ) # + some cost that depends on a,b
    else:
        return 0

(For example, when computing weighted edit distances, or evaluating recursively defined mathematical formulas.)

I then use a memoizing decorator to cache the results automatically.

PROBLEM

When I try something like f(200,10) I get:

RuntimeError: maximum recursion depth exceeded

This is as expected because the recursive implementation exhausts Python's stack space/ recursion limits.

WORKAROUNDS

I usually work around this problem by one of:

Increasing recursion limit with sys.setrecursionlimit (only works up to about 1000 depth)
Using a for loop to fill up the cache for smaller values
Changing the function to use a list as a manual stack (via append and pop calls) (in other words, moving from a recursive implementation to an iterative one)
Using an alternative programming language

but I find all of these quite error prone.

QUESTION

Is there a way to write an @Bigstack decorator that would simulate the effect of having a really big stack?

Note that my functions normally make several recursive function calls so this is not the same as tail recursion - I really do want to save all the internal state of each function on the stack.

WHAT I'VE TRIED

I've been thinking about using a list of generator expressions as my stack. By probing the stackframe I could work out when the function has been called recursively and then trigger an exception to return to the decorator code. However, I can't work out a way of gluing these ideas together to make anything that actually works.

Alternatively, I could try accessing the abstract syntax tree for the function and try transforming calls to recursive functions to yield statements, but this seems like it's heading in the wrong direction.

Any suggestions?

EDIT

It certainly looks like I am misusing Python, but another approach I have been considering is to use a different thread for each block of, say, 500 stack frames and then insert queues between each consecutive pair of threads - one queue for arguments, and another queue for return values. (Each queue will have at most one entry in it.) I think this probably doesn't work for some reason - but I'll probably only work out why after I've tried to implement it.

Yes, recursion in Python (like C) is usually only useful when you're recursing log(N) times, as that's going to stay small. If you're recursing N times you'll need to iterate. That said, it looks like there's a decorator hack here: http://code.activestate.com/recipes/474088/ — Ben Hoyt, Nov 21 '12 at 21:01
Thanks, that's exactly the kind of thing I'm looking for - although as far as I can see, that example only covers tail recursion and I can't work out how to extend it to my case. — Peter de Rivaz, Nov 21 '12 at 21:04
If you really want recursion, I think your manual stack approach is probably most robust and explicit. Magic is for wizards, not programmers. :-) — Ben Hoyt, Nov 21 '12 at 21:04
I'm afraid you are trying to do somtehing that is not what Python was built for. Anyway, maybe the folliwing decorator can help you : http://code.activestate.com/recipes/474088/ — lucasg, Nov 21 '12 at 21:05
Another possibly relevant link: http://paulbutler.org/archives/tail-recursion-in-python/ Will think about the problem some more and see if I can come up with something general that fully solves your problem. — Mark Amery, Nov 21 '12 at 21:06
Relatedly, see Guido's "final words" on tail recursion elimination: [Tail Recursion Elimination](http://neopythonic.blogspot.co.nz/2009/04/tail-recursion-elimination.html) and [Final Words on Tail Calls](http://neopythonic.blogspot.co.nz/2009/04/final-words-on-tail-calls.html) — Ben Hoyt, Nov 21 '12 at 21:08
Oh well, I suppose I should learn to stop treating Python like Haskell :( — Peter de Rivaz, Nov 21 '12 at 21:20
@Peter I would probably just write that kind of stuff in Haskell, compile it with exported functions, then call it using ctypes in Python — Jon Clements, Nov 21 '12 at 21:33
This [answer](http://stackoverflow.com/a/2918118/355230) might be helpful. — martineau, Nov 22 '12 at 00:19
Use continuation passing style + a trampoline to implement your own tail-call elimination. — Marcin, Nov 22 '12 at 16:41

kindall · Accepted Answer · 2012-11-26T00:02:47.583

To get around the recursion limit, you can catch the RuntimeError exception to detect when you've run out of stack space, and then return a continuation-ish function that, when called, restarts the recursion at the level where you ran out of space. Call this (and its return value, and so on) until you get a value, then try again from the top. Once you've memoized the lower levels, the higher levels won't run into a recursion limit, so eventually this will work. Put the repeated-calling-until-it-works in a wrapper function. Basically it's a lazy version of your warming-up-the-cache idea.

Here's an example with a simple recursive "add numbers from 1 to n inclusive" function.

import functools

def memoize(func):
    cache = {}
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        key = args, tuple(sorted(kwargs.items()))
        if key in cache:
            return cache[key]
        else:
            result = func(*args, **kwargs)
            if not callable(result):
                cache[key] = result
            return result
    return wrapper

@memoize
def _addup(n):
    if n < 2:
        return n
    else:
        try:
            result = _addup(n - 1)
        except RuntimeError:
            return lambda: _addup(n)
        else:
            return result if callable(result) else result + n

def addup(n):
    result = _addup(n)
    while callable(result):
        while callable(result):
            result = result()
        result = _addup(n)
    return result

assert addup(5000) == sum(xrange(5001))

Rather than returning the lambda function all the way back up the call chain, we can raise an exception to short-circuit that, which both improves performance and simplifies the code:

# memoize function as above, or you can probably use functools.lru_cache

class UnwindStack(Exception):
    pass

@memoize
def _addup(n):
    if n < 2:
        return n
    else:
        try:
            return _addup(n - 1) + n
        except RuntimeError:
            raise UnwindStack(lambda: _addup(n))

def _try(func, *args, **kwargs):
    try:
        return func(*args, **kwargs)
    except UnwindStack as e:
        return e[0]

def addup(n):
    result = _try(_addup, n)
    while callable(result):
        while callable(result):
            result = _try(result)
        result = _try(_addup, n)
    return result

This remains pretty inelegant, though, and still has a fair amount of overhead, and I can't imagine how you'd make a decorator out it. Python isn't really suited to this kind of thing, I guess.

As of Py3.2, we have [functools.lru_cache](http://docs.python.org/dev/library/functools.html#functools.lru_cache) :) — st0le, Nov 22 '12 at 06:50
Unfortunately you can't use that with this technique, as the memoizer needs to know not to cache returned function objects. — kindall, Nov 22 '12 at 06:52
This is an interesting approach. Perhaps we could use a list in the decorator to record the arguments of all functions on the way down, and then if the toplevel function detects a runtime exception it could try running all the functions on the list (in LIFO order) before trying the toplevel function again. This then might cooperate better with the memoizer? — Peter de Rivaz, Nov 22 '12 at 09:23
Added a version that uses an exception to short-circuit returning the "continuation." — kindall, Nov 22 '12 at 16:39
@PeterdeRivaz: Yes, I think you could definitely do that. Or the decorator could record only the arglists that caused an exceptions, or something. — kindall, Nov 22 '12 at 16:44

score 3 · Answer 2 · answered Nov 22 '12 at 17:51

Here's an implementation that uses a list of generator expressions as the stack:

def run_stackless(frame):
    stack, return_stack = [(False, frame)], []
    while stack:
        active, frame = stack.pop()
        action, res = frame.send(return_stack.pop() if active else None)
        if action == 'call':
            stack.extend([(True, frame), (False, res)])
        elif action == 'tail':
            stack.append((False, res))
        elif action == 'return':
            return_stack.append(res)
        else:
            raise ValueError('Unknown action', action)
    return return_stack.pop()

To use it you need to transform the recursive function according to the following rules:

 return expr                    -> yield 'return', expr
 recursive_call(args...)        -> (yield 'call', recursive_call(args...))
 return recursive_call(args...) -> yield 'tail', recursive_call(args...)

For example, with the cost function as a * b, your function becomes:

def f(a,b):
    if a>=0 and b>=0:
        yield 'return', min((yield 'call', f(a-1,b)),
                            (yield 'call', f(b,a-1))) + (a * b)
    else:
        yield 'return', 0

Testing:

In [140]: run_stackless(g(30, 4))
Out[140]: 410

In Python 2.6.2 it appears to offer a ~8-10x performance hit compared to direct calls.

The tail action is for tail recursion:

def factorial(n):
    acc = [1]
    def fact(n):
        if n == 0:
            yield 'return', 0
        else:
            acc[0] *= n
            yield 'tail', fact(n - 1)
    run_stackless(fact(n))
    return acc[0]

The transformation to generator-recursive style is fairly easy, and could probably be done as a bytecode hack.

Thanks, I hadn't thought of doing it like that. I guess the overhead would get less significant as I start to put more real code into the function. — Peter de Rivaz, Nov 22 '12 at 18:44

Peter de Rivaz · Answer 3 · 2012-11-26T19:36:08.753

This approach combines memoisation and increased stack depth into a single decorator.

I generate a pool of threads with each thread responsible for 64 levels of the stack.
Threads are only created once and resued (but currently never deleted).

Queues are used to pass information between threads, although note that only the thread corresponding to the current stack depth will actually have work to do.

My experiments suggest this adds around 10% overhead for a simple recursive function (and should be less for more complicated functions).

import threading,Queue

class BigstackThread(threading.Thread):

    def __init__(self,send,recv,func):
        threading.Thread.__init__( self )
        self.daemon = True
        self.send = send
        self.recv = recv
        self.func = func

    def run(self):
        while 1:
            args = self.send.get()
            v = self.func(*args)
            self.recv.put(v)

class Bigstack(object):

    def __init__(self,func):
        self.func = func
        self.cache = {}
        self.depth = 0
        self.threadpool = {}

    def __call__(self,*args):
        if args in self.cache:
            return self.cache[args]
        self.depth+=1
        if self.depth&63:
            v = self.func(*args)
        else:
            T=self.threadpool
            if self.depth not in T:
                send = Queue.Queue(1)
                recv = Queue.Queue(1)
                t = BigstackThread(send,recv,self)
                T[self.depth] = send,recv,t
                t.start()
            else:
                send,recv,_ = T[self.depth]
            send.put(args)
            v = recv.get()

        self.depth-=1
        self.cache[args]=v
        return v

@Bigstack
def f(a,b):
    if a>=0 and b>=0:
        return min(f(a-1,b),f(b-1,a))+1
    return 0

That is extremely crazy and something I never would have thought of. — kindall, Nov 26 '12 at 18:12