Loops in R - Need to use index, anyway to avoid 'for'?

Question

I know it's not the best practice in R to use the for loop because it doesn't have an enhanced performance. For almost all cases there is a function of the family *apply that solves our problems.

However I'm facing a situation where I don't see a workaround.

I need to calculate percent variation for consecutive values:

pv[1] <- 0
for(i in 2:length(x)) {
  pv[i] <- (x[i] - x[i-1])/x[i-1]
}

So, as you can see, I have to use both the x[i] element, but also the x[i-1] element. By using the *apply functions, I just see how to use the x[i]. Is there anyway I can avoid the forloops?

Tyler Rinker · Answer 1 · 2012-05-08T02:14:33.243

20

You can get the same results with:

pv <- c(0)
y <- sapply(2:length(x), function(i) {pv <<- (x[i] - x[i-1])/x[i-1]})
c(0, y)

The for loop issues that once were a problem have been optimized. Often a for loop is not slower and may even be faster than the apply solution. You have to test them both and see. I'm betting your for loop is faster than my solution.

EDIT: Just to illustrate the for loop vs. apply solution as well as what DWin discusses about vectorization I ran the benchmarking on the four solutions using microbenchmark on a win 7 machine.

Unit: microseconds
             expr     min      lq  median      uq       max
1    DIFF_Vincent  22.396  25.195  27.061  29.860  2073.848
2        FOR.LOOP 132.037 137.168 139.968 144.634 56696.989
3          SAPPLY 146.033 152.099 155.365 162.363  2321.590
4 VECTORIZED_Dwin  18.196  20.063  21.463  23.328   536.075

enter image description here

edited May 08 '12 at 02:14

answered May 06 '12 at 01:28

Tyler Rinker

108,132
65
322
519

What is the "DIF" version and what does the whole test look like? The solution from @VincentZoonekynd runs the fastest for me. – Tommy May 06 '12 at 08:44
Should have been DIFF for diff (Vincent's). To be fair witht he benchmarking don't forget to take the *100 out of DWin's solution as this adds extra computation that's a percent (not a proportion like everyone eles's solution). – Tyler Rinker May 06 '12 at 11:51
I tweaked the names on things a bit so it's clearer what's what in the benchmarking – Tyler Rinker May 06 '12 at 11:56
Educational. People could cite this answer for future Q's about efficiency of loop constructs. – IRTFM May 06 '12 at 17:22

IRTFM · Accepted Answer · 2012-05-06T03:31:22.890

18

What you offered would be the fractional variation, but if you multiplied by 100 you get the "percent variation":

pv<- vector("numeric",length(x))
pv[1] <- 0
pv[-1] <- 100* ( x[-1] - x[-length(x)] )/ x[-length(x)]

Vectorized solution. ( And you should note that for-loops are going to be just as slow as *apply solutions ... just not as pretty. Always look for a vectorized approach.)

To explain a bit more: The x[-length(x)] is the vector, x[1:(length{x-1)], and the x[-1] is the vector, x[2:length(x)], and the vector operations in R are doing the same operations as in your for-loop body, although not using an explicit loop. R first constructs the differences in those shifted vectors, x[-length(x)] - x[-1], and then divides by x[1:(length{x-1)].

edited May 06 '12 at 03:31

answered May 06 '12 at 01:28

IRTFM

258,963
21
364
487

Nice response DWin. I didn't actually know what the poster was accomplishing but I'm in 100% agreement on the vectorization. +1 – Tyler Rinker May 06 '12 at 01:37
Very nice answer! I didn't know that the vectorized approach was the fastest, I thought `lapply` was. But in the last line of code, shouldn't be `x[-1] - x[-length(x)]`? – João Daniel May 06 '12 at 02:14
@JoãoDaniel: Yes, it should. Edit applied. – IRTFM May 06 '12 at 03:31

score 16 · Answer 3 · answered May 06 '12 at 01:49

16

You can also use diff:

c( 0, diff(x) / x[-length(x)] )
c( 0, exp(diff(log(x))) - 1 )

answered May 06 '12 at 01:49

Vincent Zoonekynd

31,893
5
69
78

+1 This seems to be the fastest... And I like the log/exp variant although it isn't as fast. – Tommy May 06 '12 at 08:47
Hat tip: I have to admit that the diff() approach is somewhat more elegant than my literal translation to a vectorized solution. I was surprised it didn't benchmark better. – IRTFM May 06 '12 at 17:19
@DWin - when I benchmark, using `diff` or not makes no difference. But using `c` instead of your replacement is much faster. Something is fishy about Tyler's numbers. I ran it on `x <- runif(1e7)`... – Tommy May 07 '12 at 09:15
@Tommy: We should be comparing apples to apples. (There needs to be a replacement operation in the version using `diff` for the comparison to be informative. ) – IRTFM May 07 '12 at 12:04

Loops in R - Need to use index, anyway to avoid 'for'?

3 Answers3

Linked