21

The canonical function to demonstrate recursion is the factorial() function. I tried a simple implementation of it myself and came up with this:

factorial <- function(x){

if(x==1)
    return( 1)
else
    return(x*factorial(x-1))

}

From my survey of the topic, there seems to be some debate about whether it's better to use recursion or simple iteration. I wanted to see how R implements it and found a factorial() function in the gregmisc package. I thought I'd find something like my implementation or instead a regular iteration. What I found this:

> factorial
function (x) 
gamma(x + 1)
<environment: namespace:base>

So the answer to my question of whether R prefers recursion or iteration is "neither". At least in this implementation. Are recursive functions avoided in R for good reason?

Update:

gregmisc version:

>ptm <- proc.time()
> factorial(144)
[1] 5.550294e+249
> proc.time() - ptm
   user  system elapsed 
  0.001   0.000   0.001 

my version:

> factorial(144)
[1] 5.550294e+249
> proc.time() - ptm
  user  system elapsed 
  0.002   0.001   0.006 
Milktrader
  • 9,278
  • 12
  • 51
  • 69

3 Answers3

15

For calculation of integer factorial, the recursive implementation is slower and more complex. Invariably, iteration is used in production code.

The factorial function you refer to is in the base package. It operates on real values rather than integers, hence that implementation. Its documentation states:

factorial(x) (x! for non-negative integer x) is defined to be gamma(x+1)

A more interesting example is code to implement the Fibonnaci series which is extraordinarily wasteful when implemented with a naive recursion. The recursive approach can be made efficient through memoization but simple iteration is always to be preferred if performance is at stake.

Another common algorithm that is expressed naturally in a recursive way is Quicksort. This can, like all algorithms be implemented without recursion, but is quite complex to do so. There is little benefit in using a non-recursive Quicksort and so it's common to use the naive recursive implementation.

Recursion is a good implementation choice:

  • if performance is not compromised, and
  • if it is more natural (hence easier to verify and maintain) to implement recursively.
Tonio Liebrand
  • 17,189
  • 4
  • 39
  • 59
David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • 1
    Do these guidelines apply to R specifically? Or differently, would you say it doesn't matter if it's C++ or R, these guidelines apply to any language? – Milktrader Mar 11 '11 at 14:46
  • 3
    Note that these recommendations would be somewhat different for that implement tail call elimination, which (effectively) is an automatic transformation of certain types of recursion into iteration. – hadley Mar 11 '11 at 17:06
  • @hadley I think different language capabilities would affect how you would choose to implement algorithms, but I still think you'd follow the principles expressed in my two bullet points. It's just that if you had something like tail call elimination then the first bullet point would not be an issue. – David Heffernan Mar 11 '11 at 17:10
  • @David Coudl you provide any reference (can websites here at stack or anywhere) about whalt would be not-naive recursive functions? – Manoel Galdino Mar 11 '11 at 18:21
  • @manoel usually the naive simple algorithms are the recursive ones and the tricky ones are the non recursive variants. – David Heffernan Mar 11 '11 at 18:30
10

I think that the most natural way to express mathematical computations in R is through mathematical/statistical notation. The whole point of the language is to make it easy to express statistical computations in a natural manner.

You own example of factorial being implementated using gamma fits nicely with this view. I don't know how gamma is implemented but we don't need to know that in order to use R. As a user, the most important thing is usually to get the right answer. If the code proves to be prohibitively slow, that is when you optimize. The first place to start would again be math and the choice of algorithm, not implementation details.

David Heffernan is correct that recursion is usually slower than iteration but what he doesn't seem to consider is that it rarely matters. Using his own example, the Fibonacci numbers, the really important thing is to avoid recomputing the numbers, which can be done through memorization. This makes the computation run in linear time instead of exponential time - a huge improvement. Once you have done that, you can still get a minor improvement by implementing the algorithm using a loop but then it probably doesn't matter. Besides, there is a closed form.

Both the factorial function and the fibonacci numbers also grow very quickly. This means that each arithmetic operation (additions, multiplications, etc.) will start to take a long time, while the recursion doesn't become more expensive or at least not as quickly. Again, mathematical considerations trump implementation details.

TL;DR

My advice is:

  1. Write the algorithm in the simplest possible way.
  2. Ensure that it is correct.
  3. If and only if the algorithm is too slow on realistic input:
    1. Find out which part of the algorithm is taking the time / what the time complexity is.
    2. Fix the that part and only that part.
    3. Repeat the process, if necessary.
Jørgen Fogh
  • 7,516
  • 2
  • 36
  • 46
1

The answer is yes recursive functions are used in R. While much of R is itself written in R, some highly optimized routines are wrappers to C or FORTRAN. Furthermore much of R-BASE is primitive. Basically, fast routines where recursion is most likely to be used is least likely to be seen unless someone actually looks through the binaries.

A good example of a recursive function is probably the most commonly used function of all:

> c
function (..., recursive = FALSE)  .Primitive("c")

> set.seed(1)
> x <- list('a'=list(1:2, list(rnorm(10)), 'b'=rnorm(3)))
> c(x)
$a
$a[[1]]
[1] 1 2

$a[[2]]
$a[[2]][[1]]
 [1] -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078 -0.8204684  0.4874291  0.7383247
 [9]  0.5757814 -0.3053884


$a$b
[1]  1.5117812  0.3898432 -0.6212406


> c(x, recursive=TRUE)
        a1         a2         a3         a4         a5         a6         a7         a8 
 1.0000000  2.0000000 -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078 -0.8204684 
        a9        a10        a11        a12       a.b1       a.b2       a.b3 
 0.4874291  0.7383247  0.5757814 -0.3053884  1.5117812  0.3898432 -0.6212406 
> 
AdamO
  • 4,283
  • 1
  • 27
  • 39
  • 3
    That’s not recursion as referred to by OP, it just means that the function is applied on nested lists in its arguments by flattening them (rather than just traversing the top level of the lists). Sure, that’s *probably* implemented via recursion but not necessarily, and the argument name doesn’t imply so. Note that the documentation doesn’t specify how this is implemented – “recursively descents through list” in this context is not an implementation hint but a description by analogy. – Konrad Rudolph Jan 17 '14 at 15:42
  • 1
    @KonradRudolph Thanks for the clarification. Given how R is so --forgive me if I'm wrong here-- strongly typed (??), it wouldn't surprise me if complexly nested lists are just long sequences of values with indicators for levels of nesting for each value. – AdamO Feb 02 '14 at 05:15