92

I know that loops are slow in R and that I should try to do things in a vectorised manner instead.

But, why? Why are loops slow and apply is fast? apply calls several sub-functions -- that doesn't seem fast.

Update: I'm sorry, the question was ill-posed. I was confusing vectorisation with apply. My question should have been,

"Why is vectorisation faster?"

Saranjith
  • 11,242
  • 5
  • 69
  • 122
isomorphismes
  • 8,233
  • 9
  • 59
  • 70
  • 3
    I was under the impression that the "apply is way way faster than for loops" in R is a bit of a [myth](http://stackoverflow.com/questions/1169573/large-loops-hang-in-r/1183739#1183739). Let the `system.time` wars in the answers begin... – joran Aug 22 '11 at 03:16
  • 1
    Lots of good information here on the topic: http://stackoverflow.com/questions/2275896/is-rs-apply-family-more-than-syntactic-sugar – Chase Aug 22 '11 at 03:27
  • The premise of `apply` vs. `for`-loop question is just plain wrong. In every computer language it is always possible to write slow code. I am trying to figure out where this notion originated. Who is spreading this (mis)information? – IRTFM Aug 22 '11 at 04:34
  • See this answer to a previous question: http://stackoverflow.com/questions/6502444/slow-for-loop-in-r/6502720#6502720 – Andrie Aug 22 '11 at 06:16
  • 7
    For the record : Apply is NOT vectorization. Apply is a loop structure with different (as in: no) side effects. See the discussion @Chase links to. – Joris Meys Aug 22 '11 at 08:24
  • Another recent test on loops: http://leftcensored.skepsi.net/2011/08/21/the-performance-cost-of-a-for-loop-and-some-alternatives/ – Roman Luštrik Aug 22 '11 at 09:14
  • 4
    Loops in **S** (**S-Plus**?) were traditionally slow. This is not the case with **R**; as such, your Question isn't really relevant. I do not know what the situation with **S-Plus** is today. – Gavin Simpson Aug 22 '11 at 11:12
  • 5
    it is unclear to me why the question has been voted down heavily - this question is very common among those coming to R from other areas, and should be added to the FAQ. – patrickmdnet Aug 22 '11 at 22:36
  • 1
    @patrick It is a Question from a point of ignorance. Loops are **not** slow in R, and certainly not slower than `apply()`. Both loops and `apply()` will be slower than vectorized alternatives or compiled code. But so what? Use a vectorized function if one exists, or use a for loop properly (allocating storage etc) or *apply-family if one doesn't. Operating from a point of loops in R being slow is misinformed. It would be like asking "Why is the moon made of cream cheese?" One of the criteria for downvote is not showing research effort; this Q has been asked and answered *ad nasueam*. – Gavin Simpson Aug 23 '11 at 21:36
  • **NB:** [Changing the size of a variable](https://stat.ethz.ch/pipermail/r-help/2008-February/155617.html) each time in the loop slows it down a lot. (i.e., preallocating `x <- vector(1e6)` then looping is a good thing to try.) – isomorphismes Apr 20 '15 at 18:50

4 Answers4

82

It's not always the case that loops are slow and apply is fast. There's a nice discussion of this in the May, 2008, issue of R News:

Uwe Ligges and John Fox. R Help Desk: How can I avoid this loop or make it faster? R News, 8(1):46-50, May 2008.

In the section "Loops!" (starting on pg 48), they say:

Many comments about R state that using loops is a particularly bad idea. This is not necessarily true. In certain cases, it is difficult to write vectorized code, or vectorized code may consume a huge amount of memory.

They further suggest:

  • Initialize new objects to full length before the loop, rather than increasing their size within the loop.
  • Do not do things in a loop that can be done outside the loop.
  • Do not avoid loops simply for the sake of avoiding loops.

They have a simple example where a for loop takes 1.3 sec but apply runs out of memory.

Karl
  • 2,009
  • 15
  • 14
74

Loops in R are slow for the same reason any interpreted language is slow: every operation carries around a lot of extra baggage.

Look at R_execClosure in eval.c (this is the function called to call a user-defined function). It's nearly 100 lines long and performs all sorts of operations -- creating an environment for execution, assigning arguments into the environment, etc.

Think how much less happens when you call a function in C (push args on to stack, jump, pop args).

So that is why you get timings like these (as joran pointed out in the comment, it's not actually apply that's being fast; it's the internal C loop in mean that's being fast. apply is just regular old R code):

A = matrix(as.numeric(1:100000))

Using a loop: 0.342 seconds:

system.time({
    Sum = 0
    for (i in seq_along(A)) {
        Sum = Sum + A[[i]]
    }
    Sum
})

Using sum: unmeasurably small:

sum(A)

It's a little disconcerting because, asymptotically, the loop is just as good as sum; there's no practical reason it should be slow; it's just doing more extra work each iteration.

So consider:

# 0.370 seconds
system.time({
    I = 0
    while (I < 100000) {
        10
        I = I + 1
    }
})

# 0.743 seconds -- double the time just adding parentheses
system.time({
    I = 0
    while (I < 100000) {
        ((((((((((10))))))))))
        I = I + 1
    }
})

(That example was discovered by Radford Neal)

Because ( in R is an operator, and actually requires a name lookup every time you use it:

> `(` = function(x) 2
> (3)
[1] 2

Or, in general, interpreted operations (in any language) have more steps. Of course, those steps provide benefits as well: you couldn't do that ( trick in C.

Owen
  • 38,836
  • 14
  • 95
  • 125
  • 11
    So what's the point of the last example? Don't do stupid thing in R and expect it to do them quickly? – Chase Aug 22 '11 at 03:39
  • 6
    @Chase I guess that's one way to say it. Yeah I meant a language like C would have no speed difference with nested parentheses, but R doesn't optimize or compile. – Owen Aug 22 '11 at 03:41
  • 1
    Also (), or the { } in the loop body -- all these things involve name lookups. Or in general, in R when you write more, the interpreter does more. – Owen Aug 22 '11 at 03:44
  • 1
    I'm not sure what point you are trying to make with the `for()` loops? They aren't doing the same thing at all. The `for()` loop is iterating over each element of `A` and summing them. The `apply()` call is passing the entire vector `A[,1]` (your `A` has a single column) to a vectorised function `mean()`. I don't see how this helps the discussion and just confuses the situation. – Gavin Simpson Aug 22 '11 at 08:51
  • @Gavin True, `apply` isn't really being used at all in the second example. I should probably take that out. But they are doing almost the same thing: the long part of taking a `mean` is taking the sum. – Owen Aug 22 '11 at 09:29
  • 1
    Also I want to be clear that I'm not trying to insult R for being slow, because it's speed is on the same order as any interpreted language. Rather, interpreted languages in general suffer from a "the fast way is the slow way" problem, in which code that is logically minimal still has the overhead of calling interpreted constructs; meaning that there is a strong incentive to use native routines whenever possible. – Owen Aug 22 '11 at 09:32
  • 3
    @Owen I agree with your general point, and it is an important one; we don't use R because it is breaking speed records, we use it because it is easy to use and very powerful. That power comes with the price of interpretation. It was just unclear what you were trying to show in the `for()` vs `apply()` example. I think you should remove that example as whilst the summation is the large part of computing the mean, all your example really shows is the speed of a vectorised function, `mean()`, over the C-like iteration over elements. – Gavin Simpson Aug 22 '11 at 11:08
  • Fixed this up a bit. I hope it's more clear what I'm saying now. – Owen Aug 24 '11 at 04:22
  • are for loops still slow in R? Or has the base code been altered since this answer was written – baxx Mar 01 '19 at 18:54
40

The only Answer to the Question posed is; loops are not slow if what you need to do is iterate over a set of data performing some function and that function or the operation is not vectorized. A for() loop will be as quick, in general, as apply(), but possibly a little bit slower than an lapply() call. The last point is well covered on SO, for example in this Answer, and applies if the code involved in setting up and operating the loop is a significant part of the overall computational burden of the loop.

Why many people think for() loops are slow is because they, the user, are writing bad code. In general (though there are several exceptions), if you need to expand/grow an object, that too will involve copying so you have both the overhead of copying and growing the object. This is not just restricted to loops, but if you copy/grow at each iteration of a loop, of course, the loop is going to be slow because you are incurring many copy/grow operations.

The general idiom for using for() loops in R is that you allocate the storage you require before the loop starts, and then fill in the object thus allocated. If you follow that idiom, loops will not be slow. This is what apply() manages for you, but it is just hidden from view.

Of course, if a vectorised function exists for the operation you are implementing with the for() loop, don't do that. Likewise, don't use apply() etc if a vectorised function exists (e.g. apply(foo, 2, mean) is better performed via colMeans(foo)).

Saranjith
  • 11,242
  • 5
  • 69
  • 122
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
9

Just as a comparison (don't read too much into it!): I ran a (very) simple for loop in R and in JavaScript in Chrome and IE 8. Note that Chrome does compilation to native code, and R with the compiler package compiles to bytecode.

# In R 2.13.1, this took 500 ms
f <- function() { sum<-0.5; for(i in 1:1000000) sum<-sum+i; sum }
system.time( f() )

# And the compiled version took 130 ms
library(compiler)
g <- cmpfun(f)
system.time( g() )

@Gavin Simpson: Btw, it took 1162 ms in S-Plus...

And the "same" code as JavaScript:

// In IE8, this took 282 ms
// In Chrome 14.0, this took 4 ms
function f() {
    var sum = 0.5;
    for(i=1; i<=1000000; ++i) sum = sum + i;
    return sum;
}

var start = new Date().getTime();
f();
time = new Date().getTime() - start;
Tommy
  • 39,997
  • 12
  • 90
  • 85