Python - How to improve efficiency of complex recursive function?

Question

In this video by Mathologer on, amongst other things, infinite sums there are 3 different infinite sums shown at 9:25, when the video freezes suddenly and an elephant diety pops up, challenging the viewer to find "the probable values" of the expressions. I wrote the following script to approximate the last of the three (i.e. 1 + 3.../2...) with increasing precision:

from decimal import Decimal as D, getcontext  # for accurate results

def main(c):  # faster code when functions defined locally (I think)
    def run1(c):
        c += 1
        if c <= DEPTH:
            return D(1) + run3(c)/run2(c)
        else:
            return D(1)

    def run2(c):
        c += 1
        if c <= DEPTH:
            return D(2) + run2(c)/run1(c)
        else:
            return D(2)

    def run3(c):
        c += 1
        if c <= DEPTH:
            return D(3) + run1(c)/run3(c)
        else:
            return D(3)
    return run1(c)

getcontext().prec = 10  # too much precision isn't currently necessary

for x in range(1, 31):
    DEPTH = x
    print(x, main(0))

Now this is working totally fine for 1 <= x <= 20ish, but it starts taking an eternity for each result after that. I do realize that this is due to the exponentially increasing number of function calls being made at each DEPTH level. It is also clear that I won't be able to calculate the series comfortably up to an arbitrary point. However, the point at which the program slows down is too early for me to clearly identify the limit the series it is converging to (it might be 1.75, but I need more DEPTH to be certain).

My question is: How do I get as much out of my script as possible (performance-wise)?

I have tried:
1. finding the mathematical solution to this problem. (No matching results)
2. finding ways to optimize recursive functions in general. According to multiple sources (e.g. this), Python doesn't optimize tail recursion by default, so I tried switching to an iterative style, but I ran out of ideas on how to accomplish this almost instantly...

Any help is appreciated!

NOTE: I know that I could go about this mathematically instead of "brute-forcing" the limit, but I want to get my program running well, now that I've started...

You could use thre arrays to store the first few (or the first few thousand) values of `run1`, `run2` and `run3`. You can generate them beforehand or on the fly, for example by filling your arrays with `-1` (since your functions seem to only take positive values) and looking up the value before calling the function, storing it in the array if it doesn't exist yet. — pie3636, Aug 11 '16 at 12:58
@pie3636 Unfortunately, I find it hard to visualize how exactly such an approach would look. Please put your comment in (pseudo)-code to make it clearer for me. Thanks! — Michael, Aug 11 '16 at 13:05
See [my answer](http://stackoverflow.com/a/38897580/2700737) below. — pie3636, Aug 11 '16 at 13:20

pie3636 · Accepted Answer · 2016-08-11T13:40:50.357

You can store the results of the run1, run2 and run3 functions in arrays to prevent them from being recalculated every time, since in your example, main(1) calls run1(1), which calls run3(2) and run2(2), which in turn call run1(3), run2(3), run1(3) (again) and run3(3), and so on.

You can see that run1(3) is being called evaluated twice, and this only gets worse as the number increases; if we count the number of times each function is called, those are the results:

   run1 run2 run3
1  1    0    0
2  0    1    1
3  1    2    1
4  3    2    3
5  5    6    5
6  11   10   11
7  21   22   21
8  43   42   43
9  85   86   85
   ...
20 160,000 each (approx.)
   ...
30 160 million each (approx.)

This is actually a variant of a Pascal triangle, and you could probably figure out the results mathematically; but since here you asked for a non mathematical optimization, just notice how the number of calls increases exponentially; it doubles at each iteration. This is even worse since each call will generate thousands of subsequent calls with higher values, which is what you want to avoid.

Therefore what you want to do is store the value of each call, so that the function does not need to be called a thousand times (and itself make thousands more calls) to always get the same result. This is called memoization.

Here is an example solution in pseudo code:

before calling main, declare the arrays val1, val2, val3, all of size DEPTH, and fill them with -1

function run1(c) # same thing for run2 and run3
    c += 1
    if c <= DEPTH
        local3 = val3(c)     # read run3(c)
        if local3 is -1      # if run3(c) hasn't been computed yet
            local3 = run3(c) # we compute it
            val3(c) = local3 # and store it into the array
        local2 = val2(c)     # same with run2(c)
        if local2 is -1
            local2 = run2(c)
            val2(c) = local2

        return D(1) + local3/local2 # we use the value we got from the array or from the computation
    else
        return D(1)

Here I use -1 since your functions seem to only generate positive numbers, and -1 is an easy placeholder for the empty cells. In other cases you might have to use an object as Cabu below me did. I however think this would be slower due to the cost of retrieving properties in an object versus reading an array, but I might be wrong about that. Either way, your code should be much, much faster with it is now, with a cost of O(n) instead of O(2^n).

This would technically allow your code to run forever at a constant speed, but the recursion will actually cause an early stack overflow. You might still be able to get to a depth of several thousands before that happens though.

Edit: As ShadowRanger added in the comments, you can keep your original code and simply add @lru_cache(maxsize=n) before each of your run1, run2 and run3 functions, where n is one of the first powers of two above DEPTH (for example, 32 if depth is 25). This might require an import directive to work.

Python offers `functools.lru_cache` that usually removes the need to reimplement memoizing from scratch. Just mentioning. — ShadowRanger, Aug 11 '16 at 13:33
@ShadowRanger I like that I learn new things from the comments - thanks! — Or Duan, Aug 11 '16 at 13:48
@ShadowRanger and @pie3636 Thanks for the suggestion! I tried this, but for `DEPTH` = 200 it took a bit longer (~25s until termination as opposed to ~20s using explicit memoization). Why would that be? — Michael, Aug 11 '16 at 16:33
@Michael: Making the cache unbounded (`maxsize=None`) might help (the bounded cache requires more work per call, and if entries are evicted and needed later, you pay the cost of recomputing them). The only other difference is that `lru_cache` is `dict` based, and explicit memoization for this scenario could save some look up costs by using a `list`. That said, in Python 3.5 and higher, `lru_cache` was reimplemented in C, so an unbounded `lru_cache` is likely to beat even hand tuned explicit memoization that would still have to be done at the Python layer. — ShadowRanger, Aug 12 '16 at 00:39

Cabu · Answer 2 · 2016-08-11T13:44:43.593

With some memoization, You could get up to the stack overflow:

from decimal import Decimal as D, getcontext  # for accurate results

def main(c):  # faster code when functions defined locally (I think)
    mrun1 = {}  # store partial results of run1, run2 and run3
                # This have not been done in the as parameter of the
                # run function to be able to reset them easily

    def run1(c):
        if c in mrun1:  # if partial result already computed, return it
            return mrun1[c]

        c += 1
        if c <= DEPTH:
            v = D(1) + run3(c) / run2(c)
        else:
            v = D(1)

        mrun1[c] = v  # else store it and return the value
        return v

    def run2(c):
        if c in mrun2:
            return mrun2[c]

        c += 1
        if c <= DEPTH:
            v = D(2) + run2(c) / run1(c)
        else:
            v = D(2)

        mrun2[c] = v
        return v

    def run3(c):
        if c in mrun3:
            return mrun3[c]

        c += 1
        if c <= DEPTH:
            v = D(3) + run1(c) / run3(c)
        else:
            v = D(3)

        mrun3[c] = v
        return v

    return run1(c)

getcontext().prec = 150  # too much precision isn't currently necessary

for x in range(1, 997):
    DEPTH = x
    print(x, main(0))

Python will stack overflow if you go over 997.

"Python will stack overflow if you go over 997." That would actually depend on your setup. — pie3636, Aug 11 '16 at 13:46

Python - How to improve efficiency of complex recursive function?

2 Answers2