Tail call optimization in Mathematica?

Question

While formulating an answer to another SO question, I came across some strange behaviour regarding tail recursion in Mathematica.

The Mathematica documentation hints that tail call optimization might be performed. But my own experiments give conflicting results. Contrast, for example, the following two expressions. The first crashes the 7.0.1 kernel, presumably due to stack exhaustion:

(* warning: crashes the kernel! *)
Module[{f, n = 0},
  f[x_] := (n += 1; f[x + 1]);
  TimeConstrained[Block[{$RecursionLimit = Infinity}, f[0]], 300, n]
]

The second runs to completion, appearing to exploit tail call optimization to return a meaningful result:

Module[{f, n = 0},
  f[x_] := Null /; (n += 1; False);
  f[x_] := f[x + 1];
  TimeConstrained[Block[{$IterationLimit = Infinity}, f[0]], 300, n]
]

Both expressions define a tail recursive function f. In the case of the first function, Mathematica apparently regards the presence of a compound statement enough to defeat any chance of tail call optimization. Also note that the first expression is governed by $RecursionLimit and the second by $IterationLimit -- a sign that Mathematica is treating the two expressions differently. (Note: the SO answer referenced above has a less contrived function that successfully exploits tail call optimization).

So, the question is: does anyone know the circumstances under which Mathematica performs tail-call optimization of recursive functions? A reference to a definitive statement in the Mathematica documentation or other WRI material would be ideal. Speculation is also welcome.

I find the `Null /;` condition mysterious. Would you be so kind as to explain the desired effect and how it achieves that effect? What happens if you leave it off? — Reb.Cabin, Aug 17 '19 at 14:33
@Reb.Cabin I use `n` to count the number of calls. In the first example, incrementing `n` directly blocked the tail call optimization. So, I moved `n+=1` into another definition -- but as a part of an always-false condition so that the definition is essentially a no-op (albeit with the side-effect of updating `n`). I wrote `Null` as an arbitrary choice since the return value is never used. It is possible to achieve the same result in a single definition: `f[x_] := f[x + 1] /; (++n; True)`. — WReach, Aug 17 '19 at 23:48
I have a very long-running MMA program going right now. It runs for six to ninety-six hours at a time, delivering results at the end. I use `Fold` for iteration. I presume `Fold` does the equivalent of tail-calling, i.e., replacing the values of accumulation variables instead of accumulating them. My presumption is bolstered by the fact that if I use `FoldList` or if I `Sow` out the intermediate results, I run out of memory. It definitely does something different via `Fold`. — Reb.Cabin, Aug 25 '19 at 15:44
@Reb.Cabin Because `Fold` is built into the kernel, my suspicion is that it is implemented as a simple imperative loop written in C -- especially since C compilers rarely perform tail call optimization. But if it _is_ implemented recursively, I agree that it must be optimized. `FoldList` does not overrun the stack but rather runs out of memory because it keeps a copy of the results from every iteration. Which is probably a lot of data long before 96 hours :) — WReach, Aug 25 '19 at 16:08

score 25 · Accepted Answer · answered Jan 07 '11 at 16:24

I can summarize the conclusions I was led to by my personal experience, with a disclaimer that what follows may not be the entirely right explanation. The anwer seems to lie in the differences between Mathematica call stack and traditional call stacks, which originates from Mathematica pattern-defined functions being really rules. So, there are no real function calls. Mathematica needs a stack for a different reason: since normal evaluation happens from the bottom of an expression tree, it must keep intermediate expressions in case when deeper and deeper parts of (sub)expressions get replaced as a result of rule application (some parts of an expression grow from the bottom). This is the case, in particular, for rules defining what we'd call non tail-recursive functions in other languages. So, once again, the stack in Mathematica is a stack of intermediate expressions, not function calls.

This means that if, as a result of rule application, an (sub)expression can be rewritten in its entirety, the expression branch need not be kept on the expression stack. This is probably what is referred as tail call optimization in Mathematica - and this is why in such cases we have iteration rather than recursion (this is one very good example of the differences between rule applications and function calls). Rules like f[x_]:=f[x+1] are of this type. If, however, some sub-expression get rewritten, producing more expression structure, then expression must be stored on the stack. The rule f[x_ /; x < 5] := (n += 1; f[x + 1]) is of this type, which is a bit hidden until we recall that ()stand for CompoundExpression[]. Schematically what happens here is f[1] -> CompoundExpression[n+=1, f[2]] -> CompoundExpression[n+=1,CompoundExpression[n+=1,f[3]]]->etc. Even though the call to f is the last every time, it happens before the full CompoundExpression[] executes, so this still must be kept on the expression stack. One could perhaps argue that this is a place where optimization could be made, to make an exception for CompoundExpression, but this is probably not easy to implement.

Now, to illustrate the stack accumulation process which I schematically described above, let us limit the number of recursive calls:

Clear[n, f, ff, fff];
n = 0;
f[x_ /; x < 5] := (n += 1; f[x + 1]);

ff[x_] := Null /; (n += 1; False);
ff[x_ /; x < 5] := ff[x + 1];

fff[x_ /; x < 5] := ce[n += 1, fff[x + 1]];

Tracing the evaluation:

In[57]:= Trace[f[1],f]
Out[57]= {f[1],n+=1;f[1+1],{f[2],n+=1;f[2+1],{f[3],n+=1;f[3+1],{f[4],n+=1;f[4+1]}}}}

In[58]:= Trace[ff[1],ff]
Out[58]= {ff[1],ff[1+1],ff[2],ff[2+1],ff[3],ff[3+1],ff[4],ff[4+1],ff[5]}

In[59]:= Trace[fff[1],fff]
Out[59]= {fff[1],ce[n+=1,fff[1+1]],{fff[2],ce[n+=1,fff[2+1]],{fff[3],ce[n+=1,fff[3+1]],   
{fff[4],ce[n+=1,fff[4+1]]}}}}

What you can see from this is that the expression stack accumulates for f and fff (the latter used just to show that this is a general mechanism, with ce[] just some arbitrary head), but not for ff, because, for the purposes of pattern matching, the first definition for ff is a rule tried but not matched, and the second definition rewrites ff[arg_] in its entirety, and does not generate deeper sub-parts that need further rewriting. So, the bottom line seems that you should analyze your function and see if its recursive calls will grow the evaluated expression from the bottom or not. If yes, it is not tail-recursive as far as Mathematica is concerned.

My answer would not be complete without showing how to do the tail call optimization manually. As an example, let us consider recursive implementation of Select. We will work with Mathematica linked lists to make it reasonably efficient rather than a toy. Below is the code for the non tail-recursive implementation:

Clear[toLinkedList, test, selrecBad, sel, selrec, selTR]
toLinkedList[x_List] := Fold[{#2, #1} &, {}, Reverse[x]];
selrecBad[fst_?test, rest_List] := {fst,If[rest === {}, {}, selrecBad @@ rest]};
selrecBad[fst_, rest_List] := If[rest === {}, {}, selrecBad @@ rest];
sel[x_List, testF_] := Block[{test = testF}, Flatten[selrecBad @@ toLinkedList[x]]]

The reason I use Block and selrecBad is to make it easier to use Trace. Now, this blows the stack on my machine:

Block[{$RecursionLimit = Infinity}, sel[Range[300000], EvenQ]] // Short // Timing

You can trace on small lists to see why:

In[7]:= Trace[sel[Range[5],OddQ],selrecBad]

Out[7]= {{{selrecBad[1,{2,{3,{4,{5,{}}}}}],{1,If[{2,{3,{4,{5,{}}}}}==={},{},selrecBad@@{2,{3,{4, 
{5,{}}}}}]},{selrecBad[2,{3,{4,{5,{}}}}],If[{3,{4,{5,{}}}}==={},{},selrecBad@@{3,{4,{5, 
{}}}}],selrecBad[3,{4,{5,{}}}],{3,If[{4,{5,{}}}==={},{},selrecBad@@{4,{5,{}}}]},{selrecBad[4,
{5,{}}],If[{5,{}}==={},{},selrecBad@@{5,{}}],selrecBad[5,{}],{5,If[{}==={},{},selrecBad@@{}]}}}}}}

What happens is that the result gets accumulated deeper and deeper in the list. The solution is to not grow the depth of the resulting expression, and one way to achieve that is to make selrecBad accept one extra parameter, which is the (linked) list of accumulated results:

selrec[{fst_?test, rest_List}, accum_List] := 
    If[rest === {}, {accum, fst}, selrec[rest, {accum, fst}]];
selrec[{fst_, rest_List}, accum_List] := 
    If[rest === {}, accum, selrec[rest, accum]]

And modify the main function accordingly:

selTR[x_List, testF_] := Block[{test = testF}, Flatten[selrec[toLinkedList[x], {}]]]

This will pass our power test just fine:

In[14]:= Block[{$IterationLimit= Infinity},selTR[Range[300000],EvenQ]]//Short//Timing

Out[14]= {0.813,{2,4,6,8,10,12,14,16,18,20,
<<149981>>,299984,299986,299988,299990,299992,299994,299996,299998,300000}}

(note that here we had to modify $IterationLimit, which is a good sign). And using Trace reveals the reason:

In[15]:= Trace[selTR[Range[5],OddQ],selrec]

Out[15]= {{{selrec[{1,{2,{3,{4,{5,{}}}}}},{}],If[{2,{3,{4,{5,{}}}}}==={},{{},1},selrec[{2,{3,{4, 
{5,{}}}}},{{},1}]],selrec[{2,{3,{4,{5,{}}}}},{{},1}],If[{3,{4,{5,{}}}}==={},{{},1},selrec[{3, 
{4,{5,{}}}},{{},1}]],selrec[{3,{4,{5,{}}}},{{},1}],If[{4,{5,{}}}==={},{{{},1},3},selrec[{4, 
{5,{}}},{{{},1},3}]],selrec[{4,{5,{}}},{{{},1},3}],If[{5,{}}==={},{{{},1},3},selrec[{5, 
{}},{{{},1},3}]],selrec[{5,{}},{{{},1},3}],If[{}==={},{{{{},1},3},5},selrec[{},{{{{},1},3},5}]]}}}

which is, this version does not accumulate the depth of the intermediate expression, since the results are kept in a separate list.

+1 very nice analysis. I agree that CompoundExpression is a candidate for optimization although I was able to contrive a situation where a subexpression added definitions to CompoundExpression, changing its semantics (which the optimization would thwart). Emphasis on "contrive" -- I'm not sure it would be an issue in practice. — WReach, Jan 07 '11 at 16:44
Excellent explanation! Now the difference between `$RecursionLimit` and `$IterationLimit` become clear. And what is the `stack` has become significantly clearer. — Alexey Popkov, Mar 19 '11 at 07:39

score 4 · Answer 2 · edited Apr 13 '17 at 12:55

4

The idea of this answer is to replace the brackets () by a wrapper that does not make our expressions grow. Note that the function we are finding an alternative for is really CompoundExpression, as the OP was correct in remarking this function was ruining the tail recursion (see also the answer by Leonid). Two solutions are provided. This defines the first wrapper

SetAttributes[wrapper, HoldRest];
wrapper[first_, fin_] := fin
wrapper[first_, rest__] := wrapper[rest]

We then have that

Clear[f]
k = 0;
mmm = 1000;
f[n_ /; n < mmm] := wrapper[k += n, f[n + 1]];
f[mmm] := k + mmm
Block[{$IterationLimit = Infinity}, f[0]]

Correctly calculates Total[Range[1000]].

------Note-----

Note that it would be misleading to set

wrapper[fin_] := fin;

As in the case

f[x_]:= wrapper[f[x+1]]

Tail recursion does not occur (because of the fact that wrapper, having HoldRest, will evaluate the singular argument before applying the rule associated with wrapper[fin_]).

Then again, the definition above for f is not useful, as one could simply write

f[x_]:= f[x+1]

And have the desired tail recursion.

------Another note-----

In case we supply the wrapper with a lot arguments, it may be slower than necessary. The user may opt to write

f[x_]:=wrapper[g1;g2;g3;g4;g5;g6;g7  , f[x+1]]

Second wrapper

The second wrapper feeds its arguments to CompoundExpression and will therefore be faster than the first wrapper if many arguments are provided. This defines the second wrapper.

SetAttributes[holdLastWrapper, HoldAll]
holdLastWrapper[fin_] := fin
holdLastWrapper[other_, fin_] := 
 Function[Null, #2, HoldRest][other, fin]
holdLastWrapper[others__, fin_] := 
 holdLastWrapper[
  Evaluate[CompoundExpression[others, Unevaluated[Sequence[]]]], fin]

Note: Returning (empty) Sequences might be very useful in recursion in general. See also my answer here

https://mathematica.stackexchange.com/questions/18949/how-can-i-return-a-sequence

Note that this function will still work if only one argument is provided, as it has attribute HoldAll rather than HoldRest, so that setting

f[x]:= holdLastWrapper[f[x+1]]

Will yield a tail recursion (wrapper does not have this behavior).

Speed comparison

Let's create a nice long list (actually an expression with Head Hold) of instructions

nnnn = 1000;
incrHeld = 
  Prepend[DeleteCases[Hold @@ ConstantArray[Hold[c++], nnnn], 
    Hold, {2, Infinity}, Heads -> True], Unevaluated[c = 0]];

For these instructions, we can compare the performance (and outcome) of our wrappers with CompoundExpression

holdLastWrapper @@ incrHeld // Timing
CompoundExpression @@ incrHeld // Timing
wrapper @@ incrHeld // Timing

--> {{0.000856, 999}, {0.000783, 999}, {0.023752, 999}}

Conclusion

The second wrapper is better if you are not exactly sure when tail recursion will happen or how many arguments you will feed to the wrapper. If you are intent on feeding the wrapper 2 arguments, for example in the case where you realize all the second wrapper does is feed to CompoundExpression and you decide to do this yourself, the first wrapper is better.

-----final note----

In CompoundExpression[args, Unevaluated[expr]], expr still gets evaluated before CompoundExpression is stripped, so solutions of this type are no use.

edited Apr 13 '17 at 12:55

Community

1
1

answered Mar 08 '13 at 11:06

Jacob Akkerboom

163
9

This is very nice! +1. This seems to solve the problem for `CompoundExpression`. In many cases, however, this is not enough, e.g.for such one `f[x_]:={f[x-1],f[x-2]}` - where the container surrounding the calls is not `CompoundExpression` (but, e.g., `List`). Still,this is a very nice achievement, it seems. I have to test more, but right now it seems to be the solution for `CompoundExpression`. In some sense, this is similar to what I did, since it splits this into two rules - but your solution is general. If / when we test and become convinced that it generally works, it would make ... – Leonid Shifrin Mar 08 '13 at 15:12
... sense to ask the question like 'programmatic tools for automatic tail-call optimization' and put your suggestion as one of the answers. I also vaguely remember that @Rojo had some method of tail call optimization, based on Trott-Strzebonski technique. – Leonid Shifrin Mar 08 '13 at 15:14
@Leonid Shifrin Woohoo :). Thank you. I am very interested in looking into these things further. I will also look at the Trott-Strzebonski technique, as yesterday and today I learned a lot by following the trail you laid out :). – Jacob Akkerboom Mar 08 '13 at 15:24
1

@Leonid Shifrin I wonder what tail recursion means if a function calls itself more than once in its body, as in this case it seems necessary to put on the stack that we have to return to the original function call. Looking into it :). – Jacob Akkerboom Mar 08 '13 at 15:33
My understanding is that in this case (more than one call in the body), the function can not be made tail-recursive,unless rewritten in such as way that there is only one call. I think that tail-recursion makes more sense when we are working with (perhaps very large) linked lists, so that we convert recursion to interation. When you have multiple calls which you can't seem to be able to avoid, this may be a sign that you need to build a tree. But in a tree, typically, recursion depth is not very big of a problem, since it is typically ~ `Log[n]`, where `n` is a total number of elements in ... – Leonid Shifrin Mar 08 '13 at 16:04
... a tree, so for balanced trees, this quantity should never become too large. – Leonid Shifrin Mar 08 '13 at 16:05
@Leonid Shifrin discussing matters with Rojo lead to many nicer forms of the function. I have edited the answer to show one example. Rojo also had some non-recursive ideas making use of Condition. Note that the wrapper[fin_] := fin statement will often not be necessary. – Jacob Akkerboom Mar 08 '13 at 17:37
Once you guys get something reasonably general and tested, it may make sense to post a self-answered question, describing your findings. I would participate if I have time, but currently I am under a ton of other things alas. But I would certainly appreciate being updated on your progress here. – Leonid Shifrin Mar 08 '13 at 18:34
@LeonidShifrin I've made a bunch of updates! I am thinking of making a quirky implementation of MergeSort, so quite likely I will spend some more time on the subject and can do additional testing. However, I am quite sure I have a good idea how things work now and that the functions work (unless I made a type in the last edit ;) ). – Jacob Akkerboom Mar 11 '13 at 17:36
1

Looks nice! Would be also nice to take care of `Module`, `With` and `Block` possibly present on the top-level of the r.h.s. of the recusrive definition, more or less automatically. I think @Rojo mentioned that he had a solution using Trott-Strzebonski technique, but I did not have the time to carefully look at it. Having that, one could construct custom assignment operator which would automatically make the wide class of recursive functions tail-recursive. That would be really nice. – Leonid Shifrin Mar 11 '13 at 18:21
@LeonidShifrin I would be interested in looking further at those cases. I've also been looking at whether or not a great number of arguments will slow down Mathematica. I've made a beast of a function using which you can evaluate arguments in batches, where the batch size is customizable. Some problems with Partition and other functions which do not behave exactly as I want (they tend to evaluate things) caused to code to be slower than CompoundExpression. I think I will make a self answered question about that too, but maybe you can tell me if you know more about the subject first. – Jacob Akkerboom Mar 12 '13 at 19:30
The function is itself a decent example of tail recursion. Though this can be a bit obscure because $RecursionLimit may be reached due to the deep nesting of linkedlists. – Jacob Akkerboom Mar 12 '13 at 19:32
Do you mean some new function which you did not publish yet, or is it the one currently posted in your answer? Alas, I have zero free time for the rest of today, and probably also very little time tomorrow. So, perhaps you could discuss with someone else (@Rojo?), if you need some more eye on this right now. I will try to catch up with this in a couple of days. – Leonid Shifrin Mar 12 '13 at 19:45

Tail call optimization in Mathematica?

2 Answers2

Linked