10

I seem to misunderstand tail recursion; according to this stackoverflow question R does not support tail recursion. However, let's consider the following functions to compute the nth fibonacci number:

Iterative version:

Fibo <- function(n){
    a <- 0
    b <- 1
    for (i in 1:n){
        temp <- b
        b <- a
        a <- a + temp
    }
    return(a)
}

"Naive" recursive version:

FiboRecur <- function(n){
    if (n == 0 || n == 1){
        return(n)
    } else {
        return(FiboRecur(n-1) + FiboRecur(n-2))
        }
}

And finally an example I found that should be tail call recursive:

FiboRecurTail <- function(n){
    fib_help <- function(a, b, n){
        if(n > 0){
            return(fib_help(b, a+b, n-1))
        } else {
            return(a)
        }
    }
    return(fib_help(0, 1, n))
}

Now if we take a look at the traces when these functions are called, here is what we get:

Fibo(25)
trace: Fibo(25)
[1] 75025


trace(FiboRecur)
FiboRecur(25)
Thousands of calls to FiboRecur and takes a lot of time to run

FiboRecurTail(25)
trace: FiboRecurTail(25)
[1] 75025

In the cases of Fibo(25) and FiboRecurTail(25), the answer is displayed instantaneously and only one call is made. For FiboRecur(25), thousands of calls are made and it runs for some seconds before showing the result.

We can also take a look at the run times using the benchmark function from the package rbenchmark:

benchmark(Fibo(30), FiboRecur(30), FiboRecurTail(30), replications = 5)
               test replications elapsed relative user.self sys.self user.child sys.child
1          Fibo(30)            5    0.00       NA     0.000        0          0         0
2     FiboRecur(30)            5   13.79       NA    13.792        0          0         0
3 FiboRecurTail(30)            5    0.00       NA     0.000        0          0         0

So if R does not support tail recursion, what is happening in FiboRecurTail(25) that makes it run as fast as the iterative version while the "naive" recursive function runs like molasses? Is it rather that R supports tail recursion, but does not optimize a "naive" recursive version of a function to be tail-call recursive like other programming languages (Haskell for instance) do? This is what I understand from this post in R's mailing list.

I would greatly appreciate if someone would shed some light into this. Thanks!

Community
  • 1
  • 1
brodrigues
  • 1,541
  • 2
  • 14
  • 19

3 Answers3

5

The difference is that for each recursion, FiboRecur calls itself twice. Within FiboRecurTail, fib_help calls itself only once.

Thus you have a whole lot more function calls with the former. In the case of FiboRecurTail(25) you have a recursion depth of ~25 calls. FiboRecur(25) results in 242,785 function calls (including the first).

I didn't time any of the routines, but note that you show 0.00 for both of the faster routines. You should see some difference with a higher input value, but note that Fibo iterates exactly as much as FiboRecurTail recurses.

Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
  • Just curious, was the method for counting function calls simple/elegant or brute-force? – r2evans Aug 16 '16 at 16:26
  • @r2evans It's easy enough to compute the number of calls to `FiboRecur`, but easier yet to duplicate his function and increment a variable in the global environment. – Matthew Lundberg Aug 16 '16 at 16:35
  • That's what I thought ... just wondering if there's an easy method with profiling or something similar. Thanks! – r2evans Aug 16 '16 at 16:36
1

In the naive recursive approach, you repetitively calculated a lot of values. For example, when you calculate FiboRecur(30) you will calculate FiboRecur(29) and FiboRecur(28), and each of these two calls are independent. And in FiboRecur(29) you will calculate FiboRecur(28) again and FiboRecur(27) even though FiboRecur(28) has already been calculated somewhere else as above. And this happens for every stage of recursion. Or simply put, for every increase of n, the calculation effort almost doubles but obviously, in reality it should just be as simple as add the last two calculated numbers together.

A little summary of FiboRecur(4): FiboRecur(0) is calculated twice, FiboRecur(1) is calculated three times, FiboRecur(2) is calculated twice and FiboRecur(3) is calculated once. The former three should really be calculated once and stored somewhere so that you can extract the values whenever they are needed. And that's why you see so many function calls even though it's not a large number.

In the tail recursive version, however, every previously calculated values are passed to the next stage via a + b parameter, which avoids countless repetitive calculations as in the naive recursive version, and thus more efficient.

Psidom
  • 209,562
  • 33
  • 339
  • 356
0

The following algorithm uses accumulator parameter technique to make things tail recursive, then wraps it in a memoization function.

  1. Number of function calls shouldn't necessarily differ for tail-recursion. This is mostly about managing stack memory, not speed. Every call to fib(n) generates calls to fib(n - 1) and fib(n - 2), expect in tail-recursive cases, the stack frame is reused rather than a new one being allocated for each call.
  2. Memoization is what gives a speed-boost. Results are cached for future use.
library(hash)

# Generate Fibonacci numbers
# Tail Recursive Algorithm using Accumulator Parameter Technique
fibTR <- function(n) {
  fibLoop <- function(acc, m, k) {
    if (k == 0)
      acc
    else
      fibLoop(acc = m, m = acc + m, k = k - 1)
  }

  fibLoop(acc = 0, m = 1, k = n)
}

# A generic memoization function for function fn taking integer input
memoize <- function(fn, inp) {
  cache <- hash::hash()
  key <- as.character(inp)

  if (hash::has.key(key = key, hash = cache))
    cache[[key]]
  else {
    cache[[key]] <- inp %>% fn
    cache[[key]]
  }
}

# Partial Application of a Function
# Memoized and Tail Recursive Fibonacci Number Generator
fib <- partial(.f = memoize, fn = fibTR)

# Get the first 10 Fibonacci numbers
map(.x = 0:9, .f = fib) %>% unlist

Running fibAux(10000) yields

Error: C stack usage  15927040 is too close to the limit

So, I doubt R does efficient tail call optimization.

Another issue is the construction of the cache or lookaside table. In functional languages such as Haskell, ML, ..., that intermediary data structures get built when you first partially call the function. Assuming the same effect in R, another issue is that memory allocation in R is very expensive so is growing vectors, matrices, etc: Here, we are growing a dictionary, and if we pre-allocate the dictionary of appropriate size, then we have to supply the n argument and the cache gets constructed every time we call the function which defeats the purpose.

// Here is F# code to do the same:

// A generate Fibonacci numbers: Tail Recursive Algorithm
let fibTR n =
    let rec fibLoop acc m k =
        match k with 
        | 0 -> acc
        | n -> fibLoop m (acc + m) (n - 1)

    fibLoop 0 1 n

// A generic memoization function
let memoize (fn: 'T -> 'U) =
    let cache = new System.Collections.Generic.Dictionary<_, _>()

    fun inp ->
        match cache.TryGetValue inp with
        | true, res -> res
        | false, _ ->
            let res = inp |> fn
            cache.Add(inp, res)
            res


// A tail recursive and
let fib = fibTR |> memoize

// Get the first 10 Fibonacci numbers
[ 0..9 ] |> List.map fib 
Sasha Babaei
  • 455
  • 3
  • 8