Prelude
I have two implementations for a particular problem, one recursive and one iterative, and I want to know what causes the iterative solution to be ~30% slower than the recursive one.
Given the recursive solution, I write an iterative solution making the stack explicit.
Clearly, I simply mimic what the recursion is doing, so of course the Python engine is better optimized to handle the bookkeeping. But can we write an iterative method with similar performance?
My case study is Problem #14 on Project Euler.
Find the longest Collatz chain with a starting number below one million.
Code
Here is a parsimonious recursive solution (credit due to veritas in the problem thread plus an optimization from jJjjJ):
def solve_PE14_recursive(ub=10**6):
def collatz_r(n):
if not n in table:
if n % 2 == 0:
table[n] = collatz_r(n // 2) + 1
elif n % 4 == 3:
table[n] = collatz_r((3 * n + 1) // 2) + 2
else:
table[n] = collatz_r((3 * n + 1) // 4) + 3
return table[n]
table = {1: 1}
return max(xrange(ub // 2 + 1, ub, 2), key=collatz_r)
Here's my iterative version:
def solve_PE14_iterative(ub=10**6):
def collatz_i(n):
stack = []
while not n in table:
if n % 2 == 0:
x, y = n // 2, 1
elif n % 4 == 3:
x, y = (3 * n + 1) // 2, 2
else:
x, y = (3 * n + 1) // 4, 3
stack.append((n, y))
n = x
ysum = table[n]
for x, y in reversed(stack):
ysum += y
table[x] = ysum
return ysum
table = {1: 1}
return max(xrange(ub // 2 + 1, ub, 2), key=collatz_i)
And the timings on my machine (i7 machine with lots of memory) using IPython:
In [3]: %timeit solve_PE14_recursive()
1 loops, best of 3: 942 ms per loop
In [4]: %timeit solve_PE14_iterative()
1 loops, best of 3: 1.35 s per loop
Comments
The recursive solution is awesome:
- Optimized to skip a step or two depending on the two least significant bits.
My original solution didn't skip any Collatz steps and took ~1.86 s - It is difficult to hit Python's default recursion limit of 1000.
collatz_r(
9780657630
)
returns 1133 but requires less than 1000 recursive calls. - Memoization avoids retracing
collatz_r
length calculated on-demand formax
Playing around with it, timings seem to be precise to +/- 5 ms.
Languages with static typing like C and Haskell can get timings below 100 ms.
I put the initialization of the memoization table
in the method by design for this question, so that timings would reflect the "re-discovery" of the table values on each invocation.
collatz_r(2**1002)
raises RuntimeError: maximum recursion depth exceeded
.
collatz_i(2**1002)
happily returns with 1003
.
I am familiar with generators, coroutines, and decorators.
I am using Python 2.7. I am also happy to use Numpy (1.8 on my machine).
What I am looking for
- an iterative solution that closes the performance gap
- discussion on how Python handles recursion
- the finer details of the performance penalties associated with an explicit stack
I'm looking mostly for the first, though the second and third are very important to this problem and would increase my understanding of Python.