TLDR
You can compare the source code below, before and after to intuitively understand the approach without reading this whole answer.
I ran into issues with some multi-key quicksort code I was using to process very large blocks of text to produce suffix arrays. The code would abort due to the extreme depth of recursion required. With this approach, the termination issues were resolved. After conversion the maximum number of frames required for some jobs could be captured, which was between 10K and 100K, taking from 1M to 6M memory. Not an optimum solution, there are more effective ways to produce suffix arrays. But anyway, here's the approach used.
The approach
A general way to convert a recursive function to an iterative solution that will apply to any case is to mimic the process natively compiled code uses during a function call and the return from the call.
Taking an example that requires a somewhat involved approach, we have the multi-key quicksort algorithm. This function has three successive recursive calls, and after each call, execution begins at the next line.
The state of the function is captured in the stack frame, which is pushed onto the execution stack. When sort()
is called from within itself and returns, the stack frame present at the time of the call is restored. In that way all the variables have the same values as they did before the call - unless they were modified by the call.
Recursive function
def sort(a: list_view, d: int):
if len(a) <= 1:
return
p = pivot(a, d)
i, j = partition(a, d, p)
sort(a[0:i], d)
sort(a[i:j], d + 1)
sort(a[j:len(a)], d)
Taking this model, and mimicking it, a list is set up to act as the stack. In this example tuples are used to mimic frames. If this were encoded in C, structs could be used. The data can be contained within a data structure instead of just pushing one value at a time.
Reimplemented as "iterative"
# Assume `a` is view-like object where slices reference
# the same internal list of strings.
def sort(a: list_view):
stack = []
stack.append((LEFT, a, 0)) # Initial frame.
while len(stack) > 0:
frame = stack.pop()
if len(frame[1]) <= 1: # Guard.
continue
stage = frame[0] # Where to jump to.
if stage == LEFT:
_, a, d = frame # a - array/list, d - depth.
p = pivot(a, d)
i, j = partition(a, d, p)
stack.append((MID, a, i, j, d)) # Where to go after "return".
stack.append((LEFT, a[0:i], d)) # Simulate function call.
elif stage == MID: # Picking up here after "call"
_, a, i, j, d = frame # State before "call" restored.
stack.append((RIGHT, a, i, j, d)) # Set up for next "return".
stack.append((LEFT, a[i:j], d + 1)) # Split list and "recurse".
elif stage == RIGHT:
_, a, _, j, d = frame
stack.append((LEFT, a[j:len(a)], d)
else:
pass
When a function call is made, information on where to begin execution after the function returns is included in the stack frame. In this example, if/elif/else
blocks represent the points where execution begins after return from a call. In C this could be implemented as a switch
statement.
In the example, the blocks are given labels; they're arbitrarily labeled by how the list is partitioned within each block. The first block, "LEFT" splits the list on the left side. The "MID" section represents the block that splits the list in the middle, etc.
With this approach, mimicking a call takes two steps. First a frame is pushed onto the stack that will cause execution to resume in the block following the current one after the "call" "returns". A value in the frame indicates which if/elif/else
section to fall into on the loop that follows the "call".
Then the "call" frame is pushed onto the stack. This sends execution to the first, "LEFT", block in most cases for this specific example. This is where the actual sorting is done regardless which section of the list was split to get there.
Before the looping begins, the primary frame pushed at the top of the function represents the initial call. Then on each iteration, a frame is popped. The "LEFT/MID/RIGHT" value/label from the frame is used to fall into the correct block of the if/elif/else
statement. The frame is used to restore the state of the variables needed for the current operation, then on the next iteration the return frame is popped, sending execution to the subsequent section.
Return values
If the recursive function returns a value used by itself, it can be treated the same way as other variables. Just create a field in the stack frame for it. If a "callee" is returning a value, it checks the stack to see if it has any entries; and if so, updates the return value in the frame on the top of the stack. For an example of this you can check this other example of this same approach to recursive to iterative conversion.
Conclusion
Methods like this that convert recursive functions to iterative functions, are essentially also "recursive". Instead of the process stack being utilized for actual function calls, another programmatically implemented stack takes its place.
What is gained? Perhaps some marginal improvements in speed. Or it could serve as a way to get around stack limitations imposed by some compilers and/or execution environments (stack pointer hitting the guard page). In some cases, the amount of data pushed onto the stack can be reduced. Do the gains offset the complexity introduced in the code by mimicking something that we get automatically with the recursive implementation?
In the case of the sorting algorithm, finding a way to implement this particular one without a stack could be challenging, plus there are so many iterative sorting algorithms available that are much faster. It's been said that any recursive algorithm can be implemented iteratively. Sure... but some algorithms don't convert well without being modified to such a degree that they're no longer the same algorithm.
It may not be such a great idea to convert recursive algorithms just for the sake of converting them. Anyway, for what it's worth, the above approach is a generic way of converting that should apply to just about anything.
If you find you really need an iterative version of a recursive function that doesn't use a memory eating stack of its own, the best approach may be to scrap the code and write your own using the description from a scholarly article, or work it out on paper and then code it from scratch, or other ground up approach.