Haskell: tail recursion version of depth of binary tree

Question

First of all I have two different implementation that I believe are correct, and have profiled them and thinking they are about of the same performance:

depth::Tree a -> Int
depth Empty        = 0
depth (Branch b l r) = 1 + max (depth l) (depth r)


depthTailRec::Tree a -> Int
depthTailRec = depthTR 0 where
           depthTR d Empty          = d 
           depthTR d (Branch b l r) = let dl = depthTR (d+1) l; dr = depthTR (d+1) r in max dl dr

I was just wondering aren't people are talking about how tail recursion can be beneficial for performance? And a lot of questions are jumping into my head:

How can you make the depth function faster?
I read about something about how Haskell's laziness can reduce the need of tail recursion, is that true?
Is it the truth that every recursion can be converted into tail recursion?
Finally tail recursion can be faster and space efficient because it can be turned into loops and thus reduce the need to push and pop the stack, is my understanding right?

Your function is **not** tail recursive. And you shouldn't worry about it in this case. For it is not that easy to make a function tail recursive that needs to call itself twice to get the result. — Ingo, Jan 18 '14 at 14:23
@lngo Can you elaborate on why it is not tail recursive? Is there any good material on tail recursion? Thanks. — Bob Fang, Jan 18 '14 at 14:28
Because the last function call that must be made in `depthTailRec` to obatin the result is not `depthTailRec` but `max`. — Ingo, Jan 18 '14 at 14:30
Any comments about point number 2 about laziness? Would like to hear about that as well. — Yogesh Sajanikar, Jan 18 '14 at 19:52
If there was a stackless implementation of Haskell, then recursive iteration over trees in general wouldn't be a problem since you can't maintain constant space anyway. Some of the comments here talk about such a situation: http://lambda-the-ultimate.org/node/2673 — CMCDragonkai, Sep 11 '15 at 06:44

score 37 · Accepted Answer · edited Aug 24 '20 at 07:38

1. Why isn't your function tail recursive?

For a recursive function to be tail recursive, all the recursive calls must be in tail position. A function is in tail position if it is the last thing to be called before the function returns. In your first example you have

depth (Branch _ l r) = 1 + max (depth l) (depth r)

which is equivalent to

depth (Branch _ l r) = (+) 1 (max (depth l) (depth r))

The last function called before the function returns is (+), so this is not tail recursive. In your second example you have

depthTR d (Branch _ l r) = let dl = depthTR (d+1) l
                               dr = depthTR (d+1) r
                            in max dl dr

which is equivalent to (once you've re-lambdified all the let statements) and re-arranged a bit

depthTR d (Branch _ l r) = max (depthTR (d+1) r) (depthTR (d+1) l)

Now the last function called before returning is max, which means that this is not tail recursive either.

2. How could you make it tail recursive?

You can make a tail recursive function using continuation-passing style. Instead of re-writing your function to take a state or an accumulator, you pass in a function (called the continuation) that is an instruction for what to do with the value computed -- i.e. instead of immediately returning to the caller, you pass whatever value you have computed to the continuation. It's an easy trick for turning any function into a tail-recursive function -- even functions that need to call themselves multiple times, as depth does. It looks something like this

depth t = go t id
 where
   go  Empty         k = k 0
   go (Branch _ l r) k = go l $ \dl ->
                           go r $ \dr ->
                             k (1 + max dl dr)

Now you see that the last function called in go before it returns is itself go, so this function is tail recursive.

3. Is that it, then?

(NB this section draws from the answers to this previous question.)

No! This "trick" only pushes the problem back somewhere else. Instead of a non-tail recursive function that uses lots of stack space, we now have a tail-recursive function that eats thunks (unapplied functions) which could potentially be taking up a lot of space themselves. Fortunately, we don't need to work with arbitrary functions - in fact, there are only three kinds

\dl -> go r (\dr -> k (1 + max dl dr)) (which uses the free variables r and k)
\dr -> k (1 + max dl dr) (with free variables k and dl)
id (with no free variables)

Since there are only a finite number of functions, we can represent them as data

data Fun a = FunL (Tree a) (Fun a)  -- the fields are 'r' and 'k'
           | FunR  Int     (Fun a)  -- the fields are 'dl' and 'k'
           | FunId

We'll have to write a function eval as well, which tells us how to evaluate these "functions" at particular arguments. Now you can re-write the function as

depth t = go t FunId
 where
  go  Empty         k = eval k 0
  go (Branch _ l r) k = go l (FunL r k)

  eval (FunL r  k) d = go r (FunR d k)
  eval (FunR dl k) d = eval k (1 + max dl d)
  eval (FunId)     d = d

Note that both go and eval have calls to either go or eval in tail position -- therefore they are a pair of mutually tail recursive functions. So we've transformed the version of the function that used continuation-passing style into a function that uses data to represent continuations, and uses a pair of mutually recursive functions to interpret that data.

4. That sounds really complicated

Well, I guess it is. But wait! We can simplify it! If you look at the Fun a data type, you'll see that it's actually just a list, where each element is either a Tree a that we're going to compute the depth of, or it's an Int representing a depth that we've computed so far.

What's the benefit of noticing this? Well, this list actually represents the call stack of the chain of continuations from the previous section. Pushing a new item onto the list is pushing a new argument onto the call stack! So you could write

depth t = go t []
 where
  go  Empty         k = eval k 0
  go (Branch _ l r) k = go l (Left r : k)

  eval (Left r   : k) d = go   r (Right d : k)
  eval (Right dl : k) d = eval k (1 + max dl d)
  eval  []            d = d

Each new argument you push onto the call stack is of type Either (Tree a) Int, and as the functions recurse, they keep pushing new arguments onto the stack, which are either new trees to be explored (whenever go is called) or the maximum depth found so far (whenever eval is called).

This call strategy represents a depth-first traversal of the tree, as you can see by the fact that the left tree is always explored first by go, while the right tree is always pushed onto the call stack to be explored later. Arguments are only ever popped off the call stack (in eval) when an Empty branch has been reached and can be discarded.

5. Alright... anything else?

Well, once you've noticed that you can turn the continuation-passing algorithm into a version that mimics the call stack and traverses the tree depth first, you might start to wonder whether there's a simpler algorithm that traverses the tree depth first, keeping track of the maximum depth encountered so far.

And indeed, there is. The trick is to keep a list of branches that you haven't yet explored, together with their depths, and keep track of the maximum depth you've seen so far. It looks like this

depth t = go 0 [(0,t)]
 where
  go depth  []    = depth
  go depth (t:ts) = case t of
    (d, Empty)        -> go (max depth d)  ts
    (d, Branch _ l r) -> go (max depth d) ((d+1,l):(d+1,r):ts)

I think that's about as simple as I can make this function within the constraints of ensuring that it's tail-recursive.

6. So that's what I should use?

To be honest, your original, non tail-recursive version is probably fine. The new versions aren't any more space efficient (they always have to store the list of trees that you're going to process next) but they do have the advantage of storing the trees to be processed next on the heap, rather than on the stack - and there's lots more space on the heap.

You might want to look at the partially tail-recursive function in Ingo's answer, which will help in the case when your trees are extremely unbalanced.

why does ```go l (\dl -> (go r (\dr -> k (1 + max dl dr)))``` gives you correct result? it seems quite confusing for me... Also does it really help with performance, no matter time-wise or space-wise? Cheers — Bob Fang, Jan 18 '14 at 15:25
@dorafmon I have no idea if any of these are faster or use less space in practice (although they certainly use less stack space). You could try profiling them! — Chris Taylor, Jan 18 '14 at 15:41
@dorafmon `\dr -> k (1 + max dl dr)` is a function of one argument, `dr`, which finds the `max` of `dl` (which is in a higher scope) and `dr`, adds one to it, and uses that as an argument to the function `k` (which is also in a higher scope). — Chris Taylor, Jan 18 '14 at 15:42
a side note: in your last version (#5), I think it is safe to take `max` only in `Empty` case, to save some calculations. — Will Ness, Jan 20 '14 at 19:11
@ChrisTaylor why mystery?? :) bring your mouse up to hover above the +100 mark and reveal the name. Or inspect the edit history of your answer by clicking on "edited Jan 19...". (you might be joking there, but might not as well, if you concentrate on the content and pay less attention to the extraneous machinery of SO... ). :) — Will Ness, Feb 17 '14 at 17:21
Ah, I had no idea you can do any of this fancy stuff. In that case, thanks for the points, Will (it's odd to have no idea who you are, since I seem to see a lot of you around recently... feel free to email me and remove the fog of mystery!) — Chris Taylor, Feb 17 '14 at 17:38
Just comparing your explicit call stack version vs the continuation passing style. Both still use O(n) space right? But does both allocate space on the heap? In that case, the csp version might still be adequate if all you care about is preventing stack overflow. However the explicit call stack may use less space per n. — CMCDragonkai, Sep 11 '15 at 06:31
That's right, both are O(n) space. Not sure which is more efficient in practice (like you I would guess the explicit call stack though) — Chris Taylor, Sep 11 '15 at 06:38

score 5 · Answer 2 · answered Jan 18 '14 at 14:53

A partially tail recursive version would be this:

depth d Empty = d
depth d (Branch _ l Empty) = depth (d+1) l
depth d (Branch _ Empty r) = depth (d+1) r
depth d (Branch _ l r)     = max (depth (d+1) l) (depth (d+1) r)

Note that tail rescursion in this case (as opposed to the more complex full case in Chris' answer) is done only to skip the incomplete branches.

But this should be enough under the assumption that the depth of your trees is at most some double digit number. In fact, if you properly balance your tree, this should be fine. If your trees, OTOH, use to degenerate into lists, then this already will help to avoid stack overflow (this is a hypothesis I haven't proved, but it is certainly true for a totally degenerated tree that has no branch with 2 non empty children.).

Tail recursion is not a virtue in and of itself. It is only then important if we do not want to explode the stack with what would be a simple loop in imperative programming languages.

Will Ness · Answer 3 · 2014-01-19T16:45:35.803

to your 3., yes, e.g. by use of CPS technique (as shown in Chris's answer);

to your 4., correct.

to your 2., with lazy corecursive breadth-first tree traversal we naturally get a solution similar to Chris's last (i.e. his #5., depth-first traversal with explicated stack), even without any calls to max:

treedepth :: Tree a -> Int
treedepth tree = fst $ last queue
  where
    queue = (0,tree) : gen 1 queue

    gen  0   p                     = []
    gen len ((d,Empty)        : p) =                     gen (len-1) p 
    gen len ((d,Branch _ l r) : p) = (d+1,l) : (d+1,r) : gen (len+1) p

Though both variants have space complexity of O(n) in the worst case, the worst cases themselves are different, and opposite to each other: the most degenerate trees are the worst case for depth-first traversal (DFT) and the best case (space-wise) for breadth-first (BFT); and similarly the most balanced trees are the best case for DFT and the worst for BFT.