3

Possible Duplicate:
Speed up the loop operation in R

I have a few questions regarding loops. I know that R works faster with vectorized calculations, and I would like to change the below code to take advantage of this. Looking into some other answers on the forum the sapply function seems to be able to replace the inside for loop, but I am generating a vector of zeros so there is an error. Tao remains 1000, and I think this is creating the problem.

My primary concern is speed, as I need to create a loop around the entire algorithm and plot in different V and n sizes for some further analysis.

Thanks for your help

Alternative loop

tao = 1000
L = (tao - 1)   
n = 10      
V = 5               
I = 10000                       
V_s = matrix(rnorm(I), I, 1)
V_b = matrix(rnorm(I), I, 1)

signal <- matrix(0, L, 1)  

for( j in (n:L)){

    sapply(((j-n+1):j),function (tao) signal[j] = signal[j] + abs(V_s[tao] - V_b[tao]))

    signal[j] = (signal[j] / (n * V) )

} 

Original loop

tao = 1000
L = (tao - 1)   
n = 10      
V = 5               
I = 10000                       
V_s = matrix(rnorm(I), I, 1)
V_b = matrix(rnorm(I), I, 1)

signal <- matrix(0, L, 1)  

for( j in (n:L)){

    for( tao in ((j-n+1):j))    {

        signal[j] = (signal[j] + abs(V_s[tao] - V_b[tao]))

    }
        signal[j] = (signal[j] / (n * V) )

}
Community
  • 1
  • 1
Morten
  • 223
  • 1
  • 5
  • 15

3 Answers3

12

Using filters, you can do your computation even without any loop (and sapply is nothing more than a hidden loop).

absdif <- abs(V_s - V_b)
signal <- filter(absdif[1:L], rep(1/(n*V), n), sides=1)
signal[is.na(signal)] <- 0

Understanding what is happening in the second line is not trivial when your not used to filters, though. Let's have a closer look:

First we compute the absolute differences of V_s and V_b, which you loop uses frequently. Then comes the filter. Your computation is nothing more than summing up the the n past values at each time value j. Thus, we have something like

signal[j] <- sum(absdif[j-n+1:j])

That is exactly what convolution filters do - summing up some values - in the general form by multiplication with some weight. As weight we choose 1/(n*V), for all values, which corresponds to the normalization you do in your outer loop. The last argument, sides=1 simply tells the filter to take values only from the past (sides=2 would mean sum(absdif[(j-n/2):(j+n/2)])).

The last line just fills up the NA values at the beginning (where the filter does not have enough data to compute the sum - this equals to skipping the first n values).

Finally, some timing:

Your full-loop solution:

   User      System       total 
  0.037       0.000       0.037 

The solution of juba:

   User      System       total 
  0.007       0.000       0.008 

The solution using filters:

   User      System       total 
  0.000       0.000       0.001 

Note that the concept of filters is really well researched and can be done incredibly fast.

Edit: As noted in ?filter, R does not use Fast fourier transform with the standard filter command. Usually, FFT is the most efficient way to implement convolutions. However, even this can be done by replacing the filter command with

signal <- convolve(absdif[1:L], rep(1/(n*V), n), type='filter')

Note that now the first n entries are stripped of instead of set to NA. The result is the same, however. Timing this time is of no use - the total time is below the three digit output of system.time... However, note the following remark in the R help of filter:

convolve(, type="filter") uses the FFT for computations and so may be faster for long filters on univariate series, but it does not return a time series (and so the time alignment is unclear), nor does it handle missing values. filter is faster for a filter of length 100 on a series of length 1000, for example

Thilo
  • 8,827
  • 2
  • 35
  • 56
  • Nice solution, didn't know about this use of `filter()`. – juba Jan 25 '13 at 11:13
  • +1, although filter is also probably a loop in disguise... – Paul Hiemstra Jan 25 '13 at 11:16
  • @PaulHiemstra Maybe, but a very efficient one it seems :) – juba Jan 25 '13 at 11:24
  • @Paul Well, vectorization always is just a hidden loop, but with different steps of optimization in between. And filters should be on the same level of optimization than, say, the `abs` operator is. However, while thinking about it, we can still do better. See my edit in a second. – Thilo Jan 25 '13 at 11:27
  • Thanks for the answer, I ran your suggestions and received a different output than I was expecting. I should substitute your solution with both my for loops? – Morten Jan 25 '13 at 11:59
  • @juba often the loop is in the underlying `C` code, making it much faster than using the slow `R` loops. – Paul Hiemstra Jan 25 '13 at 12:05
  • @Morten Yep, replace with both for-loops. How does the output differ? For me it was exactly the same as your solution. – Thilo Jan 25 '13 at 12:06
  • @Thilo, I got it to work. However the output from 'signal' is not exactly alike as juba's solution. It doesn't occur until the 10th decimal place so it practically has no influence, summary stats are the same, but I was just curious why the output isn't exactly alike with the same input data? – Morten Jan 25 '13 at 12:23
  • You are running into floating point accuracy problems now. See http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f for more information. If your problem is solved, you can mark one question as accepted by clicking on the gray checkmark next to it. – Thilo Jan 25 '13 at 12:32
  • @Thilo, thanks for your help. If I were to run a simulation with varying V and n inputs and saving these outputs into a table or data.frame would you recommend running a loop around it, or an approach similar to your suggestion? – Morten Jan 25 '13 at 12:40
  • I can't see a powerful simplification in that case. Your probably good using a simple loop. – Thilo Jan 25 '13 at 13:08
3

Vectorizing calculations doesn't always mean using an *apply function.

For example, you can simplify and speed up things by replacing your second for loop with vector indexing :

for(j in (n:L)){
  sel <- (j-n+1):j
  signal[j] <- sum(abs(V_s[sel] - V_b[sel])) / (n*V)
}

For this solution, the execution time on my system is :

utilisateur     système      écoulé 
      0.008       0.004       0.009 

Whereas for your for loops it is :

utilisateur     système      écoulé 
       0.06        0.00        0.06 

By the way, you should't use the tao name for two different things.

juba
  • 47,631
  • 14
  • 113
  • 118
0

Assuming your explicit loop is a correct calc, try this:

 signal[j]<- signal[j] + 
              sapply((j-n+1):j, 
                   FUN = function(iter){ 
                           abs(V_s[iter] - V_b[iter])
                   }, V_s = V_s, V_b = V_b)

Note that sapply is returning the iter-th index absolute difference between V_s and V_b. This is then added to signal[j]

agstudy
  • 119,832
  • 17
  • 199
  • 261
Aditya Sihag
  • 5,057
  • 4
  • 32
  • 43