Why is Python recursion so expensive and what can we do about it?

Question

Suppose we want to compute some Fibonacci numbers, modulo 997.

For n=500 in C++ we can run

#include <iostream>
#include <array>

std::array<int, 2> fib(unsigned n) {
    if (!n)
        return {1, 1};
    auto x = fib(n - 1);
    return {(x[0] + x[1]) % 997, (x[0] + 2 * x[1]) % 997};
}

int main() {
    std::cout << fib(500)[0];
}

and in Python

def fib(n):
    if n==1:
        return (1, 2)
    x=fib(n-1)
    return ((x[0]+x[1]) % 997, (x[0]+2*x[1]) % 997)

if __name__=='__main__':
    print(fib(500)[0])

Both will find the answer 996 without issues. We are taking modulos to keep the output size reasonable and using pairs to avoid exponential branching.

For n=5000, the C++ code outputs 783, but Python will complain

RecursionError: maximum recursion depth exceeded in comparison

If we add a couple of lines

import sys

def fib(n):
    if n==1:
        return (1, 2)
    x=fib(n-1)
    return ((x[0]+x[1]) % 997, (x[0]+2*x[1]) % 997)

if __name__=='__main__':
    sys.setrecursionlimit(5000)
    print(fib(5000)[0])

then Python too will give the right answer.

For n=50000 C++ finds the answer 151 within milliseconds while Python crashes (at least on my machine).

Why are recursive calls so much cheaper in C++? Can we somehow modify the Python compiler to make it more receptive to recursion?

Of course, one solution is to replace recursion with iteration. For Fibonacci numbers, this is easy to do. However, this will swap the initial and the terminal conditions, and the latter is tricky for many problems (e.g. alpha–beta pruning). So generally, this will require a lot of hard work on the part of the programmer.

`sys.setrecursionlimit` works for me, note that the limit needs to be `5002` not `5000` — Alan Birtles, Jun 15 '21 at 15:18
@TedKleinBergman this isn't a *tail* recursive. it looks like gcc can unroll this somewhat to use much less stack per call than CPython (intentionally) does — Caleth, Jun 15 '21 at 15:31
@Caleth Pedantic technicality: I don't think unrolling would likely decrease and might rather even increase stack use *per call*. But it would reduce the number of calls by a fraction, and hence the total stack use. — eerorika, Jun 15 '21 at 16:06
@eerorika per C++ function call, not per object code branch-with-link — Caleth, Jun 15 '21 at 16:08
@Jarod42 Languages aren't compiled or interpreted; language *implementations* are. The standard Python distribution (known as CPython) compiles Python source to bytecode representation, similar to how Java or C# do. — Russell Borogove, Jun 16 '21 at 02:12
BTW, you can dramatically reduce the number of recursive calls by using a cache, eg the `cache` or `lru_cache` decorators from [functools](https://docs.python.org/3/library/functools.html). And you can optimize that code slightly by assigning the result of the call to two names, eg `u, v = fib(n-1)`, rather than assigning to `x` and then indexing `x` 4 times. Also, you might like my fast Fibonacci code: https://stackoverflow.com/a/40683466/4014959 ;) — PM 2Ring, Jun 16 '21 at 02:37
Check out Stackless Python. It doesn't use the call stack of the C runtime/OS, but manages its own. That might already be the modification you had in mind. — Jann Poppinga, Jun 16 '21 at 14:31
I'm a bit confused why you're taking about fibonacci numbers, when the functions you show clearly don't compute the fibonacci series. — Konrad Rudolph, Jun 16 '21 at 17:24
I've not read everything but [this](https://youtu.be/DnKxKFXB4NQ) may be relevant. — OverLordGoldDragon, Jun 16 '21 at 22:23
Your underlying unasked questions are *"a) Why can't Python do tail recursion on function calls?"* and *"b) Why is cPython's stackframe size(/recursionlimit) so low?"*. Infinite (call-based) recursion is obviously going to crash, at some point. Tail recursion in g++ saves C++ from that. Modifying Fibonacci to be modulo 997 reduces this from an infinite problem to a finite one with a state space of at most 997**2, and makes it more amenable to caching, which Python certainly can do - and that's a one-line decorator call. This question's a moving target. It didn't ask "How to speed Python up?" — smci, Jun 17 '21 at 04:47
Thirteen years of previous writings on this: [Does Python optimize tail recursion?](https://stackoverflow.com/questions/13591970/does-python-optimize-tail-recursion), [What is the maximum recursion depth in Python, and how to increase it?](https://stackoverflow.com/questions/3323001/what-is-the-maximum-recursion-depth-in-python-and-how-to-increase-it), [Python recursion limit vs stack size?](https://stackoverflow.com/questions/50193138/python-recursion-limit-vs-stack-size)... — smci, Jun 17 '21 at 04:59
...and essentially the same comparison [Speed comparison with Project Euler: C vs Python vs Erlang vs Haskell](https://stackoverflow.com/questions/6964392/speed-comparison-with-project-euler-c-vs-python-vs-erlang-vs-haskell) — smci, Jun 17 '21 at 04:59
...and [Is there a Python caching library?](https://stackoverflow.com/questions/1427255/is-there-a-python-caching-library) to which the 3.x standard-library answer is `functools`. — smci, Jun 17 '21 at 05:03
"Of course, one solution is to replace recursion with iteration. [...] So generally this will require a lot of hard work on the part of the programmer." The proper tool to *generating* sequences define via recursion from a base-case is [*co*recursion](https://en.wikipedia.org/wiki/Corecursion). The corresponding feature in Python is aptly named a generator. It has the exact same initial and terminal conditions as the code shown in the question. — MisterMiyagi, Jun 17 '21 at 05:30
Can you please clarify what exactly you are asking about? I see at least three separate topics in there: How can I compute "Fibonacci modulo 997" efficiently in Python? Why are calls cheaper in compiled versus interpreted programs? How can I make the "Python compiler" handler deeper recursion? The first seems answerable, the second is extremely broad, and the third should have several duplicates. — MisterMiyagi, Jun 17 '21 at 06:19
@PM2Ring "you can dramatically reduce the number of recursive calls by using a cache". It's a good advice in general, but I really don't think it applies in this case. Actually adding `@lru_cache(maxsize=None)` to OP's code makes it fail faster because it reaches `maximum recursion depth`. Cache would help if the method called itself twice at each level. With the current structure, fib(500) calls itself 500 times, even without cache. — Eric Duminil, Jun 17 '21 at 13:18
@Eric That's a very good point. Simply decorating the OP's code with `lru-cache` isn't sufficient. As you say, we also need to use a better algorithm that gives us a binary tree of calls instead of the OP's linear list. That's what my `fast_fib` code does, which I linked in that same comment. — PM 2Ring, Jun 17 '21 at 13:48
FWIW, here's a slightly improved version of that code, which handles modulus (set `modulus` to 0 for plain Fibonacci numbers). The code also shows when it's hitting the cache, and prints the hit stats and cache size at the end. — PM 2Ring, Jun 17 '21 at 13:51
[Fast mod Fibonacci](https://sagecell.sagemath.org/?z=eJylktFqgzAUhu_7FD-FDdOma-KlIAz2GCJDbcpETYfGq7F33zFGmxS3m-VC9P_ix8k5ea21UX1Rmd1FXXGty0inUgiO7nYZ23FIBUt2oFUV1YdCii-RgLBMIDlien7PWBuCGRGR28DqisG8W6eTTKu-QqPWs_AeT-uzp2qi_dueQ7OAkD6TOY4pZJD3yoy9c2U6X5lqh233y7ZbOPcKOjpNpHGEZDifEa-g4CiJrUfrcKI93AvudvoUkyfGAQW5SkYvpY8l4WKhE3TI75abRHgcq35KF_jIpM_C1j_T7B63UxVLC6ez0f_BjoZC24sgJVsDfTO_DNO21kobJxV_jGerCLFRxOm_RXgzdpeHwjmbr4h3aZkX0y3haJWOrI6xH4KHvRw=&lang=python) — PM 2Ring, Jun 17 '21 at 13:52
@PM2Ring Indeed, I love this method, and already wrote an answer based on it : https://stackoverflow.com/a/68017363/6419007 . OP is calculating `F(2*n) % p`, BTW, not `F(n) % p`. — Eric Duminil, Jun 17 '21 at 14:01
@KonradRudolph Yes, it's a bit confusing that the OP's code is computing (Fib(2n), Fib(2n+1)) mod 997. — PM 2Ring, Jun 17 '21 at 14:03
Generally speaking (virtually always) use of recursion is bad programming practice, including (especially) fibonacci calculations. Solution: avoid recursion. — Jim Fell, Jun 17 '21 at 19:07
@dan04 maybe modulo has been used to avoid requiring BigInts in C++? — Eric Duminil, Jun 17 '21 at 20:11
You asked, "Why is [it] so _expensive_?" But I don't see anywhere you measured a _cost._ I only see that you showed that a particular Python implementation, _by default,_ allocates room on the stack for fewer function activations than a particular C++ implementation allocates. — Solomon Slow, Jun 17 '21 at 21:23
[Recursion applied to lists vs trees in Python](https://stackoverflow.com/questions/66842109/recursion-applied-to-lists-vs-trees-in-python/66846438#66846438) seems applicable here but I'm not sure the fundamental premise of this question makes sense, as others have pointed out. Python is a language, not an implementation, so it needs to specify an implementation such as the CPython interpreter to ensure correct scoping. Pypy, for example, doesn't have the same characteristics as CPython does when it comes to recursion. Within an implementation, there's a _lot_ that can be done about recursion. — ggorlen, Jul 07 '21 at 18:53

score 52 · Answer 1 · edited Jun 18 '21 at 07:30

52

The issue is that Python has an internal limit on number of recursive function calls.

That limit is configurable as shown in Quentin Coumes' answer. However, too deep a function chain will result in a stack overflow. This underlying limitation¹ applies to both C++ and Python. This limitation also applies to all function calls, not just recursive ones.

In general: You should not write² algorithms that have recursion depth growth with linear complexity or worse. Logarithmically growing recursion is typically fine. Tail-recursive functions are trivial to re-write as iterations. Other recursions may be converted to iteration using external data structures (usually, a dynamic stack).

A related rule of thumb is that you shouldn't have large objects with automatic storage. This is C++-specific since Python doesn't have the concept of automatic storage.

¹ The underlying limitation is the execution stack size. The default size differs between systems, and different function calls consume different amounts of memory, so the limit isn't specified as a number of calls but in bytes instead. This too is configurable on some systems. I wouldn't typically recommend touching that limit due to portability concerns.

² Exceptions to this rule of thumb are certain functional languages that guarantee tail-recursion elimination - such as Haskell - where that rule can be relaxed in case of recursions that are guaranteed to be eliminated. Python is not such a language, and the function in question isn't tail-recursive. While C++ compilers can perform the elimination as an optimization, it isn't guaranteed, and is typically not optimized in debug builds. Hence, the exception doesn't generally apply to C++ either.

Disclaimer: The following is my hypothesis; I don't actually know their rationale: The Python limit is probably a feature that detects potentially infinite recursions, preventing likely unsafe stack overflow crashes and substituting a more controlled RecursionError.

Why are recursive calls so much cheaper in C++?

C++ is a compiled language. Python is interpreted. (Nearly) everything is cheaper in C++, except the translation from source code to an executable program.

edited Jun 18 '21 at 07:30

Toby Speight

27,591
48
66
103

answered Jun 15 '21 at 15:20

eerorika

232,697
12
197
326

2

(responding to a now-deleted comment) I believe that practically setting the limit low is encouragement to not write heavily-recursive code which in Python which is likely either an error or very inefficient due to creating lots of expensive intermediate objects – ti7 Jun 15 '21 at 15:31
@jtallk [Here](https://godbolt.org/z/oW6WeEYrM) it crashes before n=1'000'000. `Is there a reason for this relative difference?` Presumably, Python function calls consume more execution stack space. – eerorika Jun 15 '21 at 15:32
I guess it depends on the specifics of the machine. I was using https://ideone.com/5KsxJD – jtallk Jun 15 '21 at 15:39
21

Languages aren't compiled or interpreted; language *implementations* are. The standard Python distribution (known as CPython) compiles Python source to bytecode representation, similar to how Java or C# do. C++ is faster for many operations because of the semantics of the language, not because of compiling versus interpreting. – Russell Borogove Jun 16 '21 at 02:13
46

@RussellBorogove That's mostly philosophical pedantry in my opinion. There exists only compiled implementations of C++ (I don't count "mostly conforming"), and in this context there is no practical difference between purely interpreted and just-in-time-byte-compiler or whatever CPython's definition is. Neither have the opportunity to spend the time to optimise that AOT compiler has. I do agree that language semantics help as well... although I would suspect that the fact that whether language is designed to be AOT or JIT has an effect on what semantics the languages are designed to have. – eerorika Jun 16 '21 at 02:32
13

It's correctness, not pedantry, and not a matter of opinion. An interpreted Python would be vastly slower than CPython. – Russell Borogove Jun 16 '21 at 02:37
7

eeroika, the CPython compiler is *not* JIT. The Python source is compiled to bytecode. That bytecode is then executed on a virtual machine. That is, the bytecode is the machine language of the virtual machine. – PM 2Ring Jun 16 '21 at 02:42
@PM2Ring `compiler is not JIT.` Is it not compiled to bytecode upon running the program? – eerorika Jun 16 '21 at 02:51
9

@eerorika [JIT compiling](https://en.wikipedia.org/wiki/Just-in-time_compilation) involves compilation *during* execution, and generally involves dynamic analysis to determine what parts of the code could benefit from being compiled. In contrast, CPython code is compiled immediately before execution commences, if necessary (the interpreter can use existing compiled images if their timestamp is later than that of the source code, if the system saves such images to disk). – PM 2Ring Jun 16 '21 at 02:58
3

Is [Cling](https://cdn.rawgit.com/root-project/cling/master/www/index.html) also only mostly conforming? They use clang and claim it supports everything clang does. – Arsenal Jun 16 '21 at 13:53
3

@Arsenal Accroding to https://root.cern/cling/ their goal is to be backward compatible with CINT, which accroding to their documentation is subtly different from C++. So, I would guess that Cling also differs, but to be fair my analysis isn't very deep. – eerorika Jun 16 '21 at 15:28
Scheme guarantees tail-recursion optimization; Haskell does not, nor does it need to, because a single function doesn't necessarily produce a single basic block. – chepner Jun 16 '21 at 18:09
1

Technically, any interpreted language can be made into a “compiled” language by generating executable files that contain both the interpreted code and the interpreter. – dan04 Jun 16 '21 at 22:55
3

@ti7: The official reason is to throw a (friendly, handlable) exception before you blow the real (C) stack and the program crashes hard. As the OP observed, raising the limit too high means the program crashes hard; you couldn't catch the exception and handle it in a nice way even if you wanted to. – ShadowRanger Jun 16 '21 at 23:56
3

@eerorika C++ compiled wasm produces wasm bytecode, which is interpreted by essentially the same engine as javascript bytecode is in a browser. The resulting wasm is much faster than JS for many purposes, because of the semantics of C++ make it amenable to optimization. Generally highly optimized "gc-style" languages take a 2x hit over non-gc languages, and languages that go further like python another 2x-5x hit; in C++, `pair` is just 8 bytes, while in Python the equivalent is a much more complicated structure. There is nothing _wrong_ with that cost, but it is semantics not JIT. – Yakk - Adam Nevraumont Jun 17 '21 at 13:34
@ShadowRanger of course! the original comment was why it was _so very low_ rather than something higher – ti7 Jun 17 '21 at 15:08
1

@PM2Ring A virtual machine executing bytecode is essentially the same as an interpreter. – Barmar Jun 18 '21 at 22:14
@Barmar Of course. It's the 2nd strategy for an interpreter described on [Wikipedia](https://en.wikipedia.org/wiki/Interpreter_(computing)). Whereas a JIT scheme is generally a combination of the 2nd & 3rd strategies, where bytecode may be translated to the machine code of the actual hardware CPU. – PM 2Ring Jun 18 '21 at 22:40

score 45 · Answer 2 · edited Jun 18 '21 at 07:34

45

Let me first answer your direct questions:

Why are recursive calls so much cheaper in C++?

Because C++ has no limitation on recursive call depth, except the size of the stack. And being a fully compiled language, loops (including recursion) are much faster in C++ than in Python (the reason why special Python modules like numpy/scipy directly use C routines). Additionally, most C++ implementations use a special feature called tail recursion elimination (see later in this post) and optimize some recursive code into iterative equivalents. This is nice here but not guaranteed by the standard, so other compilations could result in a program crashing miserably - but tail recursion is probably not involved here.

If the recursion is too deep and exhausts the available stack, you will invoke the well-known Undefined Behaviour where anything can happen, from an immediate crash to a program giving wrong results (IMHO the latter is much worse and cannot be detected...)

Can we somehow modify the Python compiler to make it more receptive to recursion?

No. Python implementation explicitly never uses tail recursion elimination. You could increase the recursion limit, but this is almost always a bad idea (see later in this post why).

Now for the true explanation of the underlying rationale.

Deep recursion is evil, full stop. You should never use it. Recursion is a handy tool when you can make sure that the depth will stay in sane limits. Python uses a soft limit to warn the programmer that something is going wrong before crashing the system. On the other hand, optimizing C and C++ compilers often internally change tail recursion into an iterative loop. But relying on it is highly dangerous because a slight change could prevent that optimization and cause an application crash.

As found in this other SO post, common Python implementations do not implement that tail recursion elimination. So you should not use recursion at a 5000 depth but instead use an iterative algorithm.

As your underlying computation will need all Fibonacci numbers up to the specified one, it is not hard to iteratively compute them. Furthermore, it will be much more efficient!

edited Jun 18 '21 at 07:34

Toby Speight

27,591
48
66
103

answered Jun 15 '21 at 15:29

Serge Ballesta

143,923
11
122
252

34

"Deep recursion is evil, full stop." -- This seems like a strong statement when some languages (e.g. pure PF langs) have no iteration at all but are still quite usable – Quelklef Jun 16 '21 at 04:22
@Quelklef: I am sorry if I was not clear. It did not mean that recursion was evil. It is indeed a nice tool. What I mean is that when using it a programmer should wonder whether the depth will be acceptable. In Python, a beginner should never try to increase the recursion limit, because they would better use a different algorithm. It is in fact a very advanced feature that should only be used in rare use cases after a careful analysis to make sure that is is really worth it... – Serge Ballesta Jun 16 '21 at 09:04
... And in C++ (the other tagged language) even it tail recursion elimination is a fact with optimizing compilers, it is not guaranteed by the current language standard. So relying on it could result in an application that works fine or miserably fails depending on the used compiler or even on compilation options. That is the reason why I strongly use beginners to avoid **deep** recursion as most as they can. – Serge Ballesta Jun 16 '21 at 09:08
2

Python is not a functional programming language. Serge Ballesta's statement may not apply to all languages, but it certainly does for Python. – theberzi Jun 16 '21 at 09:10
1

Re *"Deep recursion is evil, full stop"*: But many problems naturally lends themselves to recursion, like [traversing (binary) trees](https://en.wikipedia.org/wiki/Tree_traversal). Is implementing your own stack any better? – Peter Mortensen Jun 16 '21 at 10:58
This answer does not address the question. The Fibonacci code is clearly a dummy example: the OP's finger pointed at the Moon. You were expected to look at the Moon, not at the finger. – Edgar Bonet Jun 16 '21 at 11:40
7

*As you know that you will need all Fibonacci numbers below the searched one*. No, you don't. The nth Fibonacci number is equal to the nearest integer to φ^n/√5, where φ is the golden ratio ((1 - √5)/2). Neither recursion nor iteration is needed to calculate Fibonacci numbers. (But calculating those numbers isn't the issue in the question) – Abigail Jun 16 '21 at 11:42
5

@Abigail: You are perfectly right on a mathematical point of view. And plain wrong when we come to large numbers in Python. The direct mathematical formula will use floating point numbers which are [IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) 64 bits binary numbers. With a precision of at most 15 or 16 decimal digits. While Python integers are multiprecision numbers, with an arbitrary precision. – Serge Ballesta Jun 16 '21 at 12:04
1

@EdgarBonet: I do not agree. What this answer means is what *except in very special use cases* if you wind yourself having to increase Python recursion limit, you should not do unless no other path is possible. And even then, you should carefully analyze what would be the higest possible resursion limit **in that specific use case**. And a beginner should never do without asking a more experimented programmer. But it is only my opinion and my advice. And I have only been programming for more than 40 years... – Serge Ballesta Jun 16 '21 at 12:10
The quality of your advise is not disputed. Yet, you failed to address the OP's question. – Edgar Bonet Jun 16 '21 at 13:38
1

@EdgarBonet: This is a good point. I have edited my post. – Serge Ballesta Jun 16 '21 at 14:29
2

@PeterMortensen: Traversing a binary tree is not *deep* recursion in the sense I assume this answer means (and as [eerorika's answer](https://stackoverflow.com/a/67989063) makes explicit): the recursion depth needed to traverse a balanced binary tree of *n* elements is O(log *n*). If your tree is small enough to fit in RAM (or even on disk) and not extremely unbalanced (to the point where even calling it a tree becomes questionable), Python's default recursion limit is more than enough to traverse it. – Ilmari Karonen Jun 16 '21 at 14:29
@SergeBallesta: You can use the formula with a high-precision floating-point type or with `Fraction`. Then it's just a matter of calculating an accurate-enough approximation of φ (which can be done by iterating `phi = 1 + 1 / phi`) to make the integer part correct. – dan04 Jun 16 '21 at 16:34
2

@dan04… and that's supposed to be faster than a simple recursion adding O(n) integers?! – Konrad Rudolph Jun 16 '21 at 23:00
13

The C++ code isn’t tail recursive, and neither GCC nor clang transform it into a tail recursive implementation. Consequently, neither perform TCO here (at `-O2`). And the C++ code *does* crash for larger `n`. This answer is completely unrelated to the actual reason why the code works in C++ but not in Python. – Konrad Rudolph Jun 16 '21 at 23:03
@KonradRudolph: It depends on how many iterations are needed to compute φ to the necessary precision. I haven't done the math. – dan04 Jun 16 '21 at 23:03
8

For a less academic argument that not all the Fibonacci numbers are needed, consider the following standard-ish algorithm: compute the matrix power [[1,1],[1,0]]^n by repeated squaring, and then return the top left element of the result. This only computes O(log(n)) Fibonacci numbers below the searched one. – Daniel Wagner Jun 17 '21 at 02:52
2

@DanielWagner: [Indeed](https://stackoverflow.com/a/68017363/6419007). – Eric Duminil Jun 17 '21 at 14:02
2

@dan04 There are faster ways to compute phi, eg `phi = (phi*phi + 1) / (2*phi - 1)`. [Demo](https://sagecell.sagemath.org/?z=eJxFTjsOgzAM3TnFE1NSWloqsSAhdeAeKA1JiAQEBQ89fg1Ewov9_D62jWHGYLSf1QQ_ryESugTVhu4OZ0iHhcyPso_nHpWmbDAWVgzeedra-iWbDFwnRgvWJVIexJUhZLlGo1lz8ge9jp4Xncirss5Phw0RPecgqsUZUaFIhvLrqZ_M4mgUMt29MgS32z4WqCSeEO8DPRhdyri_x2v5B6tSSmc=&lang=python). – PM 2Ring Jun 17 '21 at 14:51
1

@Abigail That phi-based formula, which is derived from [Binet's formula](https://en.wikipedia.org/wiki/Fibonacci_number#Closed-form_expression) is certainly useful, if you know phi with sufficient precision. But it's not much use for computing huge modulus Fibonacci numbers, eg `Fib(10**100) % (10**9)`, which equals 560546875. – PM 2Ring Jun 17 '21 at 15:04
2

@KonradRudolph: After your comment, and a second reading of the used algorithm I have changed my answer. You are absolutely right: tail recursion elimination is not involved here. – Serge Ballesta Jun 18 '21 at 06:13

score 32 · Accepted Answer · answered Jun 16 '21 at 09:28

32

A solution is a trampoline: the recursive function, instead of calling another function, returns a function that makes that call with the appropriate arguments. There's a loop one level higher that calls all those functions in a loop until we have the final result. I'm probably not explaining it very well; you can find resources online that do a better job.

The point is that this converts recursion to iteration. I don't think this is faster, maybe it's even slower, but the recursion depth stays low.

An implementation could look like below. I split the pair x into a and b for clarity. I then converted the recursive function to a version that keeps track of a and b as arguments, making it tail recursive.

def fib_acc(n, a, b):
    if n == 1:
        return (a, b)
    return lambda: fib_acc(n - 1, (a+b) % 997, (a+2*b) % 997)

def fib(n):
    x = fib_acc(n, 1, 2)
    while callable(x):
        x = x()
    return x

if __name__=='__main__':
    print(fib(50000)[0])

answered Jun 16 '21 at 09:28

Roel Schroeven

1,778
11
12

4

heres one python implementation https://pypi.org/project/trampoline/ – jk. Jun 16 '21 at 09:37
1

Beauitful. This would also be an excellent answer to https://stackoverflow.com/questions/189725/what-is-a-trampoline-function , simpler and clearer than most of the examples there. – Wayne Conrad Jun 16 '21 at 22:04
1

This seems to just re-invent generators. How does it address the difference between C++ and Python? – MisterMiyagi Jun 17 '21 at 05:26
1

@MisterMiyagi It addresses the "what can we do about it?" part of the question. – Roel Schroeven Jun 17 '21 at 09:02
So what about the other parts of the question? – MisterMiyagi Jun 17 '21 at 11:26
Cool looking answer. It's nice that you manage to save the code without changing much. Just like OP, you actually compute `fibonacci(2*n) % p`, not `fibonacci(n) % p`, BTW. – Eric Duminil Jun 17 '21 at 14:07
This doesn't answer half the question, which is why the recursion is so expensive in Python in the first place – glaba Jun 18 '21 at 19:57
2

@glaba Recursion (as typically implemented) involves a global name lookup, because the recursive call is just a free variable, not a direct reference to the current function. It could be sped up using various closure tricks to make the recursive reference a local variable, but that does nothing about the cost of calling *any* function in Python. – chepner Jun 18 '21 at 20:28
1

Also, functions are distinct objects that need to be interpreted by the Python runtime, not just blocks of code in memory that can be jumped to. Python's data model is just very, very different from C++'s. – chepner Jun 18 '21 at 20:31
To @MisterMiyagi and glaba: I didn't answer the first half of the question because that was already covered in other answers. Should I have included an explanation for that part too to make my answer a complete answer? – Roel Schroeven Jun 19 '21 at 14:48

Eric Duminil · Answer 4 · 2021-06-23T08:13:11.067

"Both will find the answer 996 without issues"

I do see at least one issue : the answer should be 836, not 996.

It seems that both your functions calculate Fibonacci(2*n) % p, and not Fibonacci(n) % p.

996 is the result of Fibonacci(1000) % 997.

Pick a more efficient algorithm

An inefficient algorithm stays an inefficient algorithm, regardless if it's written in C++ or Python.

In order to compute large Fibonacci numbers, there are much faster methods than simple recursion with O(n) calls (see related article).

For large n, this recursive O(log n) Python function should run in circles around your above C++ code:

from functools import lru_cache

@lru_cache(maxsize=None)
def fibonacci(n, p):
    "Calculate Fibonacci(n) modulo p"
    if n < 3:
        return [0, 1, 1][n]
    if n % 2 == 0:
        m = n // 2
        v1 = fibonacci(m - 1, p)
        v2 = fibonacci(m, p)
        return (2*v1 + v2) * v2 % p
    else:
        m = (n + 1) // 2
        v1 = fibonacci(m, p) ** 2
        v2 = fibonacci(m - 1, p) ** 2
        return (v1 + v2) % p


print(fibonacci(500, 997))
#=> 836
print(fibonacci(1000, 997))
#=> 996

Try it online!

It will happily calculate fibonacci(10_000_000_000_000_000, 997).

It's possible to add recursion level as parameter, in order to see how deep the recursion needs to go, and display it with indentation. Here's an example for n=500:

# Recursion tree:
500
  249
    124
      61
        30
          14
            6
              2
              3
                1
                2
            7
              4
          15
            8
        31
          16
      62
    125
      63
        32
  250

Try it online!

Your examples would simply look like very long diagonals:

The question doesn't even compute actual fibonacci numbers. Even if all the other parts about comparing Python to C++ and the explicit request on making Python handle the existing recursion better were not there, this answer would still be missing the question. — MisterMiyagi, Jun 17 '21 at 11:55
Thanks for taking the comment gracefully. FWIW, I think it is the proper approach, just not for the question asked. It certainly doesn't seem worse than the other answers. I'm afraid this answer just missed the upvote rush hour. — MisterMiyagi, Jun 17 '21 at 13:27
@MisterMiyagi: I'm here to learn new stuff and have interesting discussions, not (only) get upvotes. :D I just wasted half an hour trying to get the same result as OP. In vain, because *both* the functions are somehow broken, and don't return F(n) % 997. It [should be](https://www.wolframalpha.com/input/?i=modulo%28fibonacci%28500%29%2C+997%29) 836, not 996. — Eric Duminil, Jun 17 '21 at 13:37
@MisterMiyagi: I've found the problem and updated my answer. — Eric Duminil, Jun 17 '21 at 14:03
@PM2Ring: I didn't notice your comment. You're right, it's not too clever to calculate F(n), F(n-1) *and* F(n+1) separately. — Eric Duminil, Jun 23 '21 at 08:07

Thomas Weller · Answer 5 · 2021-06-17T18:55:50.670

For Windows executables, the stack size is specified in the header of the executable. For the Windows version of Python 3.7 x64, that size is 0x1E8480 or exactly 2.000.000 bytes.

That version crashes with

Process finished with exit code -1073741571 (0xC00000FD)

and if we look that up we find that it's a Stack Overflow.

What we can see on the (native) stack with a native debugger like WinDbg (enable child process debugging) is

[...]
fa 000000e9`6da1b680 00007fff`fb698a6e     python37!PyArg_UnpackStack+0x371
fb 000000e9`6da1b740 00007fff`fb68b841     python37!PyEval_EvalFrameDefault+0x73e
fc 000000e9`6da1b980 00007fff`fb698a6e     python37!PyArg_UnpackStack+0x371
fd 000000e9`6da1ba40 00007fff`fb68b841     python37!PyEval_EvalFrameDefault+0x73e
fe 000000e9`6da1bc80 00007fff`fb698a6e     python37!PyArg_UnpackStack+0x371
ff 000000e9`6da1bd40 00007fff`fb68b841     python37!PyEval_EvalFrameDefault+0x73e

2:011> ? 000000e9`6da1bd40 - 000000e9`6da1ba40 
Evaluate expression: 768 = 00000000`00000300

So Python will use 2 Stack frames per method call and there's an enormous 768 bytes difference in the stack positions.

If you modify that value inside the EXE (make a backup copy) with a hex editor to, let's say 256 MB

you can run the following code

[...]
if __name__=='__main__':
    sys.setrecursionlimit(60000)
    print(fib(50000)[0])

and it will give 151 as the answer.

In C++, we can also force a Stack Overflow, e.g. by passing 500.000 as the parameter. While debugging, we get

0:000> .exr -1
ExceptionAddress: 00961015 (RecursionCpp!fib+0x00000015)
ExceptionCode: c00000fd (Stack overflow)
[...]
0:000> k
[...]
fc 00604f90 00961045     RecursionCpp!fib+0x45 [C:\...\RecursionCpp.cpp @ 7] 
fd 00604fb0 00961045     RecursionCpp!fib+0x45 [C:\...\RecursionCpp.cpp @ 7] 
fe 00604fd0 00961045     RecursionCpp!fib+0x45 [C:\...\RecursionCpp.cpp @ 7] 
ff 00604ff0 00961045     RecursionCpp!fib+0x45 [C:\...\RecursionCpp.cpp @ 7] 

0:000> ? 00604ff0 - 00604fd0
Evaluate expression: 32 = 00000020

which is just 1 stack frame per method call and only 32 bytes difference on the stack. Compared to Python, C++ can do 768/32 = 24x more recursions for the same stack size.

My Microsoft compiler created the executable with the default stack size of 1 MB (Release build, 32 Bit):

The 64 bit version has a stack difference of 64 bit (also Release build).

Tools used:

Microsoft WinDbg Preview (free)
Sweetscape 010 Editor (commercial) with the template for PE files

qcoumes · Answer 6 · 2021-06-15T15:23:41.060

6

You can increase the recursion limit using:

import sys

sys.setrecursionlimit(new_limit)

But note that this limit exists for a reason and that pure Python is not optimized for recursion (and compute-intensive tasks in general).

edited Jun 15 '21 at 15:23

answered Jun 15 '21 at 15:11

qcoumes

505
4
13

15

This just turns a clean Python stack overflow into a messy C stack overflow once you run out of C-level stack space. – user2357112 Jun 16 '21 at 07:40
6

This isn't a solution, it just pushes the inevitable stack overflow due to infinite recursion further out. The OP's underlying unasked questions are *"a) Why doesn't Python do tail recursion on function calls?"* and *"b) Why is cPython's stackframe size(/recursionlimit) so low? – smci Jun 17 '21 at 04:43

abhinav omprakash · Answer 7 · 2021-07-16T04:23:00.920

1

An alternative to a trampoline is to use reduce. if you can change the recursive function to be tail-recursive, you can implement it with reduce, here's a possible implementation.

reduce is internally implemented iteratively, so you get to use your recursive function without blowing up the stack.

def inner_fib(acc, num):
    # acc is a list of two values 
    return [(acc[0]+acc[1]) % 997, (acc[0]+2*acc[1]) % 997]

def fib(n):
  return reduce(inner_fib, 
                range(2, n+1), # start from 2 since n=1 is covered in the base case
                [1,2])        # [1,2] is the base case

edited Jul 16 '21 at 04:23

answered Jul 15 '21 at 16:22

abhinav omprakash

65
2
6

2

Be aware the question is not about plain Fibonacci numbers, but modulo 997. – MisterMiyagi Jul 15 '21 at 16:24
thanks for pointing that out. is this better? – abhinav omprakash Jul 16 '21 at 04:23

Why is Python recursion so expensive and what can we do about it?

7 Answers7

"Both will find the answer 996 without issues"

Pick a more efficient algorithm

Linked