14

I know it's not the best practice in R to use the for loop because it doesn't have an enhanced performance. For almost all cases there is a function of the family *apply that solves our problems.

However I'm facing a situation where I don't see a workaround.

I need to calculate percent variation for consecutive values:

pv[1] <- 0
for(i in 2:length(x)) {
  pv[i] <- (x[i] - x[i-1])/x[i-1]
}

So, as you can see, I have to use both the x[i] element, but also the x[i-1] element. By using the *apply functions, I just see how to use the x[i]. Is there anyway I can avoid the forloops?

João Daniel
  • 8,696
  • 11
  • 41
  • 65

3 Answers3

20

You can get the same results with:

pv <- c(0)
y <- sapply(2:length(x), function(i) {pv <<- (x[i] - x[i-1])/x[i-1]})
c(0, y)

The for loop issues that once were a problem have been optimized. Often a for loop is not slower and may even be faster than the apply solution. You have to test them both and see. I'm betting your for loop is faster than my solution.

EDIT: Just to illustrate the for loop vs. apply solution as well as what DWin discusses about vectorization I ran the benchmarking on the four solutions using microbenchmark on a win 7 machine.

Unit: microseconds
             expr     min      lq  median      uq       max
1    DIFF_Vincent  22.396  25.195  27.061  29.860  2073.848
2        FOR.LOOP 132.037 137.168 139.968 144.634 56696.989
3          SAPPLY 146.033 152.099 155.365 162.363  2321.590
4 VECTORIZED_Dwin  18.196  20.063  21.463  23.328   536.075

enter image description here

Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
  • What is the "DIF" version and what does the whole test look like? The solution from @VincentZoonekynd runs the fastest for me. – Tommy May 06 '12 at 08:44
  • Should have been DIFF for diff (Vincent's). To be fair witht he benchmarking don't forget to take the *100 out of DWin's solution as this adds extra computation that's a percent (not a proportion like everyone eles's solution). – Tyler Rinker May 06 '12 at 11:51
  • I tweaked the names on things a bit so it's clearer what's what in the benchmarking – Tyler Rinker May 06 '12 at 11:56
  • Educational. People could cite this answer for future Q's about efficiency of loop constructs. – IRTFM May 06 '12 at 17:22
18

What you offered would be the fractional variation, but if you multiplied by 100 you get the "percent variation":

pv<- vector("numeric",length(x))
pv[1] <- 0
pv[-1] <- 100* ( x[-1] - x[-length(x)] )/ x[-length(x)]

Vectorized solution. ( And you should note that for-loops are going to be just as slow as *apply solutions ... just not as pretty. Always look for a vectorized approach.)

To explain a bit more: The x[-length(x)] is the vector, x[1:(length{x-1)], and the x[-1] is the vector, x[2:length(x)], and the vector operations in R are doing the same operations as in your for-loop body, although not using an explicit loop. R first constructs the differences in those shifted vectors, x[-length(x)] - x[-1], and then divides by x[1:(length{x-1)].

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Nice response DWin. I didn't actually know what the poster was accomplishing but I'm in 100% agreement on the vectorization. +1 – Tyler Rinker May 06 '12 at 01:37
  • Very nice answer! I didn't know that the vectorized approach was the fastest, I thought `lapply` was. But in the last line of code, shouldn't be `x[-1] - x[-length(x)]`? – João Daniel May 06 '12 at 02:14
  • @JoãoDaniel: Yes, it should. Edit applied. – IRTFM May 06 '12 at 03:31
16

You can also use diff:

c( 0, diff(x) / x[-length(x)] )
c( 0, exp(diff(log(x))) - 1 )
Vincent Zoonekynd
  • 31,893
  • 5
  • 69
  • 78
  • +1 This seems to be the fastest... And I like the log/exp variant although it isn't as fast. – Tommy May 06 '12 at 08:47
  • Hat tip: I have to admit that the diff() approach is somewhat more elegant than my literal translation to a vectorized solution. I was surprised it didn't benchmark better. – IRTFM May 06 '12 at 17:19
  • @DWin - when I benchmark, using `diff` or not makes no difference. But using `c` instead of your replacement is much faster. Something is fishy about Tyler's numbers. I ran it on `x <- runif(1e7)`... – Tommy May 07 '12 at 09:15
  • @Tommy: We should be comparing apples to apples. (There needs to be a replacement operation in the version using `diff` for the comparison to be informative. ) – IRTFM May 07 '12 at 12:04