2

I have a simple vector as follows:

x = c(14.24, 13.82, 12.75, 12.92, 12.94, 13.00, 14.14, 16.28, 20.64, 17.64)

I am trying to find the rolling EMA of this vector using the following function -

library(TTR)
y = EMA(x, 5)

I am getting the result as follows -

 [1]     NA     NA     NA     NA 13.33400 13.22267 13.52844 14.44563 16.51042 16.88695

However, I want the result as below -

 [1]     14.24 14.03 13.06 13.43 13.33400 13.22267 13.52844 14.44563 16.51042 16.88695
  1. The first value should be the same as in the original vector
  2. The second value should be the EMA of the first and second value
  3. The third value should be the EMA of the initial three values in the vector
  4. The fourth value should be the EMA of the initial four values in the vector

The rest of the calculation is correctly handled by the function EMA

Solutions I tried -

  1. Running the following command - zoo::rollapplyr(x, width = 5, FUN = EMA, partial = TRUE) will give error as EMA has its own rolling window.

  2. Using function stats::filter works but the answer is not correct as I am not sure about the right value of the ratio parameter. Fast R implementation of an Exponentially Weighted Moving Average? Here is a custom function.

ema_2 <- function (k, width) {
  ratio <- 2/(width + 1)
  c(stats::filter(k * ratio, 1 - ratio, "convolution", init = k[1]))
}

The ideal solution should take at most twice the computation time as taken by EMA function of TTR library.

Here are the performance results of the 2 solutions shared by Waldi and Andre.

              expr     min       lq     mean   median       uq      max neval cld
    TTR::EMA(x, 5) 433.593 457.5815 500.9478 477.0535 530.7105  1128.49  1000   a
        EMA3(x, 5) 445.388 468.9585 515.2009 490.4345 546.5025  1843.46  1000   a
 rollmeanEMA(x, 5) 436.689 461.0885 535.7035 481.8815 538.3150 33122.75  1000   a

Thanks!

Saurabh
  • 1,566
  • 10
  • 23

2 Answers2

3

Looking at C source code of EMA shows that the first value is the mean of the averaging window:

    /* Raw mean to start EMA */
    double seed = 0.0;
    for(i = first; i < first + i_n; i++) {
      d_result[i] = NA_REAL;
      seed += d_x[i] / i_n;
    }
    d_result[first + i_n - 1] = seed;

This can easily be calculated to replace the NAs :

x = c(14.24, 13.82, 12.75, 12.92, 12.94, 13.00, 14.14, 16.28, 20.64, 17.64)

EMA2 <- function(x,n) {
  y = TTR::EMA(x, n)
  noNA <- which.min(is.na(x))
  y[noNA:(noNA+n-2)] <- cumsum(x[noNA:(noNA+n-2)])/1:(n-1)
  y
}

EMA2(x,5)
#>  [1] 14.24000 14.03000 13.60333 13.43250 13.33400 13.22267 13.52844 14.44563
#>  [9] 16.51042 16.88695

this also works with leading NAs:

x = c(NA, NA, 14.24, 13.82, 12.75, 12.92, 12.94, 13.00, 14.14, 16.28, 20.64, 17.64)
EMA2(x,5)
#> [1]       NA       NA 14.24000 14.03000 13.60333 13.43250 13.33400 13.22267 13.52844 14.44563
#> [11] 16.51042 16.88695

The calculation overhead on this short vector is minimal, and this should be even better on a longer vector :

microbenchmark::microbenchmark(TTR::EMA(x,5),EMA2(x,5),times=1000)

#> Unit: microseconds
#>           expr   min    lq     mean median     uq   max neval cld
#> TTR::EMA(x, 5) 157.7 161.8 181.6156  164.0 180.55 593.5  1000  a 
#>     EMA2(x, 5) 164.2 167.5 193.0643  170.6 193.20 857.1  1000   b
Waldi
  • 39,242
  • 6
  • 30
  • 78
  • It seems like ```cumsum(x[1:(n-1)])/1:(n-1)``` is calculating simple moving average (SMA) for initial ```n``` values and for remaining values, EMA is calculated. Is my understanding correct? – Saurabh Jan 09 '21 at 18:42
  • Yes, this is correct and I checked that EMA does exactly the same – Waldi Jan 09 '21 at 19:16
  • Can you please adjust the function to handle NA values? Here is an example vector x = c(NA, NA, 14.24, 13.82, 12.75, 12.92, 12.94, 13.00, 14.14, 16.28, 20.64, 17.64). Running EMA2 function on this vector produces all NA values initially. – Saurabh Jan 09 '21 at 19:25
  • adjusted for leading NAs – Waldi Jan 09 '21 at 20:56
  • Thank you! EMA2 now works great on vectors with leading NA values. – Saurabh Jan 10 '21 at 04:22
2

This gives the desired result:

require(TTR)

x <- c(14.24, 13.82, 12.75, 12.92, 12.94, 13.00, 14.14, 16.28, 20.64, 17.64)

rollmeanEMA <- function(vec, len) {
  c(cumsum(vec[1:(len-1)]) / seq_along(vec[1:(len-1)]),
    EMA(vec, len)[len:length(vec)])
}

rollmeanEMA(x,5)
#[1] 14.24000 14.03000 13.60333 13.43250 13.33400 13.22267 13.52844 14.44563
#[9] 16.51042 16.88695

Edit: As I suggested in the comments, replacing the NA part with mean(). This gives a tremendous speedup. Plus, removed the surrounding condition.

y <- rnorm(1000000)

system.time( rollmeanEMA(y,10000) )
#   user  system elapsed
#  0.031   0.003   0.034

system.time( EMA(y,10000) )
#   user  system elapsed
#  0.018   0.002   0.019

Added NA "handling":

rollmeanEMA <- function(vec, len) {
  v_n <- !is.na(vec)
  c( vec[is.na(vec)],
     cumsum(vec[v_n][1:(len-1)]) / seq_along(vec[v_n][1:(len-1)]),
     EMA(vec[v_n], len)[len:length(vec[v_n])])
}
Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29
  • 1
    That's because its called 6 times. Filling the NAs is done by a separate call to EMA. To achieve better runtimes you probably have to hack EMA itself. – Andre Wildberg Jan 06 '21 at 16:16
  • 1
    Yes. In a highly performant, heavily used application it's worth doing it. If you're just calling the function once per minute it's probably bearable. – Andre Wildberg Jan 06 '21 at 16:25
  • 1
    If your runs are independent you can put them in a parallel environment with `parSapply`. See here https://www.rdocumentation.org/packages/parallel/versions/3.6.2 – Andre Wildberg Jan 06 '21 at 16:36
  • 1
    I think its fine to replace the NA part with a simple mean. Replace `EMA(x,i)[i]` in the 4th line of the function with `mean(x[1:i])`. Definitely test the results, but I think it should be equivalent. If this gives you the right values I can incorporate it into the answer. – Andre Wildberg Jan 06 '21 at 17:44
  • Thanks! rollmeanEMA function is now faster than EMA2 shared by Waldi. However, it does not handle the leading NAs in the vector. x = c(NA, NA, 14.24, 13.82, 12.75, 12.92, 12.94, 13.00, 14.14, 16.28, 20.64, 17.64). Is it possible for you to add the NA handling? Also, please remove the unlist function from the answer as it is no more required after the updated function. – Saurabh Jan 10 '21 at 17:43
  • 1
    I added `NA` handling. Probably makes it slower again :) Couldn't optimize it further with the current approach. – Andre Wildberg Jan 10 '21 at 19:29
  • Thanks a lot Andre. Function rollmeanEMA is a bit faster and more resilient than the EMA2 function written by Waldi. Function rollmeanEMA also handles the corner case where vector contains NA in between e.g. x = c(NA, NA, 14.24, 13.82, NA, 12.75, 12.92, 12.94, NA, 13.00, 14.14, 16.28, 20.64, 17.64), in which EMA2 breaks. I am accepting your answer. Sorry, I have already awarded 50 points of reputation to Waldi a day before. – Saurabh Jan 11 '21 at 05:55