Calculating moving average

Question

I'm trying to use R to calculate the moving average over a series of values in a matrix. There doesn't seem to be a built-in function in R that will allow me to calculate moving averages. Do any packages provide one? Or do I need to write my own?

score 251 · Answer 1 · edited Mar 28 '19 at 16:16

251

Or you can simply calculate it using filter, here's the function I use:

ma <- function(x, n = 5){filter(x, rep(1 / n, n), sides = 2)}

If you use dplyr, be careful to specify stats::filter in the function above.

edited Mar 28 '19 at 16:16

Gregor Thomas

136,190
20
167
294

answered Feb 01 '11 at 12:06

Matti Pastell

9,135
3
37
44

56

I should point out that "sides=2" may be an important option in many people's use cases that they don't want to overlook. If you want only trailing information in your moving average, you should use sides=1. – evanrsparks Apr 02 '12 at 20:58
37

Some years later but dplyr now has a filter function, if you have this package loaded use `stats::filter` – blmoore Apr 08 '15 at 14:00
`sides = 2` is equivalent to align="center" for the zoo::rollmean or RcppRoll::roll_mean. `sides = 1` is equivalent to "right" alignment. I don't see a way to do "left" alignment or calculate with "partial" data (2 or more values)? – Matt L. Sep 18 '17 at 20:32
3

`stats::filter` gives a time series object. Pass the result to `as.vector` to get a vector. – qwr Jul 19 '20 at 08:34
This may be useful to read: https://stackoverflow.com/a/61777773/3348414 – SqueakyBeak Jul 11 '22 at 14:20

score 170 · Accepted Answer · edited Aug 11 '17 at 20:04

170

Rolling Means/Maximums/Medians in the zoo package (rollmean)
MovingAverages in TTR
ma in forecast

edited Aug 11 '17 at 20:04

Frank

66,179
8
96
180

answered Apr 13 '09 at 13:06

f3lix

29,500
10
66
86

3

What is the moving average in R not containing future values of given timestamp? I checked `forecast::ma` and it contains all neighbourhood, not right. – hhh Sep 07 '18 at 20:52
2

Try the `stats::filter` function instead. There you can set `sides = 1`for only past values. E.g. `stats::filter(x, rep(1,5), sides = 1)/5` for the mean over 5 values. – panuffel May 07 '21 at 12:03
1

I would add `frollmean` function in data.table. – Jordi Aceiton Oct 11 '22 at 10:57

pipefish · Answer 3 · 2019-02-26T11:34:03.777

38

Using cumsum should be sufficient and efficient. Assuming you have a vector x and you want a running sum of n numbers

cx <- c(0,cumsum(x))
rsum <- (cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]) / n

As pointed out in the comments by @mzuther, this assumes that there are no NAs in the data. to deal with those would require dividing each window by the number of non-NA values. Here's one way of doing that, incorporating the comment from @Ricardo Cruz:

cx <- c(0, cumsum(ifelse(is.na(x), 0, x)))
cn <- c(0, cumsum(ifelse(is.na(x), 0, 1)))
rx <- cx[(n+1):length(cx)] - cx[1:(length(cx) - n)]
rn <- cn[(n+1):length(cx)] - cn[1:(length(cx) - n)]
rsum <- rx / rn

This still has the issue that if all the values in the window are NAs then there will be a division by zero error.

edited Feb 26 '19 at 11:34

answered Aug 12 '15 at 20:26

pipefish

898
1
9
13

10

One downside to this solution is that it can't handle missings: `cumsum(c(1:3,NA,1:3))` – Jthorpe Feb 24 '16 at 19:15
@Ricardo Cruz: it might be better to remove the NAs and adjust the vector length accordingly. Think of a vector with a lot of NAs -- zeros will pull the average toward zero, while removing the NAs will leave the average as it is. It all depends on your data and the question you want to answer, of course. :) – mzuther Oct 02 '18 at 14:24
1

@mzuther, I updated the answer following your comments. Thanks for the input. I think the correct way of dealing with missing data is not extending the window (by removing the NA values), but by averaging each window by the correct denominator. – pipefish Oct 05 '18 at 17:33
1

rn <- cn[(n+1):length(cx)] - cx[1:(length(cx) - n)] should actually be rn <- cn[(n+1):length(cx)] - cn[1:(length(cx) - n)] – adrianmcmenamin Feb 21 '19 at 15:55
You also have to handle Inf values, no? – jangorecki Aug 31 '22 at 13:43
Actually, I think Inf are naturally handled correctly. Averaging over data that holds an Inf will result in an Inf. I'm not sure what happens (or should happen) if there are both Inf and -Inf, though – pipefish Jan 30 '23 at 09:16

jangorecki · Answer 4 · 2023-01-31T16:49:52.723

In data.table 1.12.0 new frollmean function has been added to compute fast and exact rolling mean carefully handling NA, NaN and +Inf, -Inf values.

As there is no reproducible example in the question there is not much more to address here.

You can find more info about ?frollmean in manual, also available online at ?frollmean.

Examples from manual below:

library(data.table)
d = as.data.table(list(1:6/2, 3:8/4))

# rollmean of single vector and single window
frollmean(d[, V1], 3)

# multiple columns at once
frollmean(d, 3)

# multiple windows at once
frollmean(d[, .(V1)], c(3, 4))

# multiple columns and multiple windows at once
frollmean(d, c(3, 4))

## three above are embarrassingly parallel using openmp

score 13 · Answer 5 · answered Aug 21 '13 at 17:11

13

The caTools package has very fast rolling mean/min/max/sd and few other functions. I've only worked with runmean and runsd and they are the fastest of any of the other packages mentioned to date.

answered Aug 21 '13 at 17:11

eddi

49,088
6
104
155

1

This is awesome! It is the only function that does this in a nice, simple way. And it's 2018 now... – Felipe Gerard Apr 17 '18 at 22:30

score 11 · Answer 6 · answered Apr 23 '20 at 03:59

Here is example code showing how to compute a centered moving average and a trailing moving average using the rollmean function from the zoo package.

library(tidyverse)
library(zoo)

some_data = tibble(day = 1:10)
# cma = centered moving average
# tma = trailing moving average
some_data = some_data %>%
    mutate(cma = rollmean(day, k = 3, fill = NA)) %>%
    mutate(tma = rollmean(day, k = 3, fill = NA, align = "right"))
some_data
#> # A tibble: 10 x 3
#>      day   cma   tma
#>    <int> <dbl> <dbl>
#>  1     1    NA    NA
#>  2     2     2    NA
#>  3     3     3     2
#>  4     4     4     3
#>  5     5     5     4
#>  6     6     6     5
#>  7     7     7     6
#>  8     8     8     7
#>  9     9     9     8
#> 10    10    NA     9

You can use one mutate call for multiple new columns by separating each new column with a comma. — H5470, Oct 23 '20 at 19:32

score 10 · Answer 7 · edited Aug 16 '18 at 05:22

10

You could use RcppRoll for very quick moving averages written in C++. Just call the roll_mean function. Docs can be found here.

Otherwise, this (slower) for loop should do the trick:

ma <- function(arr, n=15){
  res = arr
  for(i in n:length(arr)){
    res[i] = mean(arr[(i-n):i])
  }
  res
}

edited Aug 16 '18 at 05:22

Jaap

81,064
34
182
193

answered Jan 16 '15 at 14:49

cantdutchthis

31,949
17
74
114

3

Can you please explain me in details, how does this algorithm work? Because I cannot understand the idea – Daniel Yefimov Mar 13 '17 at 16:01
First he initializes a vector of the same length with `res = arr`. Then there is a loop that iterates starting at `n` or, the 15th element, to the end of the array. that means the very first subset he takes the mean of is `arr[1:15]` which fills spot `res[15]`. Now, I prefer setting`res = rep(NA, length(arr))` instead of `res = arr` so each element of `res[1:14]` equals NA rather than a number, where we couldn't take a full average of 15 elements. – Evan Friedland Sep 17 '18 at 00:50
I think it should be `arr[(i-n+1):i]` – gaspar Jul 12 '21 at 10:11

score 7 · Answer 8 · edited May 23 '17 at 11:47

In fact RcppRoll is very good.

The code posted by cantdutchthis must be corrected in the fourth line to the window be fixed:

ma <- function(arr, n=15){
  res = arr
  for(i in n:length(arr)){
    res[i] = mean(arr[(i-n+1):i])
  }
  res
}

Another way, which handles missings, is given here.

A third way, improving cantdutchthis code to calculate partial averages or not, follows:

  ma <- function(x, n=2,parcial=TRUE){
  res = x #set the first values

  if (parcial==TRUE){
    for(i in 1:length(x)){
      t<-max(i-n+1,1)
      res[i] = mean(x[t:i])
    }
    res

  }else{
    for(i in 1:length(x)){
      t<-max(i-n+1,1)
      res[i] = mean(x[t:i])
    }
    res[-c(seq(1,n-1,1))] #remove the n-1 first,i.e., res[c(-3,-4,...)]
  }
}

Cristóbal Alcázar · Answer 9 · 2017-06-29T14:29:46.520

In order to complement the answer of cantdutchthis and Rodrigo Remedio;

moving_fun <- function(x, w, FUN, ...) {
  # x: a double vector
  # w: the length of the window, i.e., the section of the vector selected to apply FUN
  # FUN: a function that takes a vector and return a summarize value, e.g., mean, sum, etc.
  # Given a double type vector apply a FUN over a moving window from left to the right, 
  #    when a window boundary is not a legal section, i.e. lower_bound and i (upper bound) 
  #    are not contained in the length of the vector, return a NA_real_
  if (w < 1) {
    stop("The length of the window 'w' must be greater than 0")
  }
  output <- x
  for (i in 1:length(x)) {
     # plus 1 because the index is inclusive with the upper_bound 'i'
    lower_bound <- i - w + 1
    if (lower_bound < 1) {
      output[i] <- NA_real_
    } else {
      output[i] <- FUN(x[lower_bound:i, ...])
    }
  }
  output
}

# example
v <- seq(1:10)

# compute a MA(2)
moving_fun(v, 2, mean)

# compute moving sum of two periods
moving_fun(v, 2, sum)

score 5 · Answer 10 · answered Sep 17 '20 at 20:51

5

You may calculate the moving average of a vector x with a window width of k by:

apply(embed(x, k), 1, mean)

answered Sep 17 '20 at 20:51

Feiming Chen

69
1
3

An extension of this to data.frames is: `apply(df,rc,FUN=function(x) apply(embed(x, k),1,mean))` . `rc` can be one or two, for rows or columns, respectively. – understorey May 28 '21 at 17:53
this is not moving average – Qbik Jul 08 '22 at 13:12

score 3 · Answer 11 · answered Mar 24 '20 at 18:43

The slider package can be used for this. It has an interface that has been specifically designed to feel similar to purrr. It accepts any arbitrary function, and can return any type of output. Data frames are even iterated over row wise. The pkgdown site is here.

library(slider)

x <- 1:3

# Mean of the current value + 1 value before it
# returned as a double vector
slide_dbl(x, ~mean(.x, na.rm = TRUE), .before = 1)
#> [1] 1.0 1.5 2.5


df <- data.frame(x = x, y = x)

# Slide row wise over data frames
slide(df, ~.x, .before = 1)
#> [[1]]
#>   x y
#> 1 1 1
#> 
#> [[2]]
#>   x y
#> 1 1 1
#> 2 2 2
#> 
#> [[3]]
#>   x y
#> 1 2 2
#> 2 3 3

The overhead of both slider and data.table's frollapply() should be pretty low (much faster than zoo). frollapply() looks to be a little faster for this simple example here, but note that it only takes numeric input, and the output must be a scalar numeric value. slider functions are completely generic, and you can return any data type.

library(slider)
library(zoo)
library(data.table)

x <- 1:50000 + 0L

bench::mark(
  slider = slide_int(x, function(x) 1L, .before = 5, .complete = TRUE),
  zoo = rollapplyr(x, FUN = function(x) 1L, width = 6, fill = NA),
  datatable = frollapply(x, n = 6, FUN = function(x) 1L),
  iterations = 200
)
#> # A tibble: 3 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 slider      19.82ms   26.4ms     38.4    829.8KB     19.0
#> 2 zoo        177.92ms  211.1ms      4.71    17.9MB     24.8
#> 3 datatable    7.78ms   10.9ms     87.9    807.1KB     38.7

MS Berends · Answer 12 · 2020-10-12T14:34:37.923

EDIT: took great joy in adding the side parameter, for a moving average (or sum, or ...) of e.g. the past 7 days of a Date vector.

For people just wanting to calculate this themselves, it's nothing more than:

# x = vector with numeric data
# w = window length
y <- numeric(length = length(x))

for (i in seq_len(length(x))) {
  ind <- c((i - floor(w / 2)):(i + floor(w / 2)))
  ind <- ind[ind %in% seq_len(length(x))]
  y[i] <- mean(x[ind])
}

y

But it gets fun to make it independent of mean(), so you can calculate any 'moving' function!

# our working horse:
moving_fn <- function(x, w, fun, ...) {
  # x = vector with numeric data
  # w = window length
  # fun = function to apply
  # side = side to take, (c)entre, (l)eft or (r)ight
  # ... = parameters passed on to 'fun'
  y <- numeric(length(x))
  for (i in seq_len(length(x))) {
    if (side %in% c("c", "centre", "center")) {
      ind <- c((i - floor(w / 2)):(i + floor(w / 2)))
    } else if (side %in% c("l", "left")) {
      ind <- c((i - floor(w) + 1):i)
    } else if (side %in% c("r", "right")) {
      ind <- c(i:(i + floor(w) - 1))
    } else {
      stop("'side' must be one of 'centre', 'left', 'right'", call. = FALSE)
    }
    ind <- ind[ind %in% seq_len(length(x))]
    y[i] <- fun(x[ind], ...)
  }
  y
}

# and now any variation you can think of!
moving_average <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = mean, side = side, na.rm = na.rm)
}

moving_sum <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = sum, side = side, na.rm = na.rm)
}

moving_maximum <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = max, side = side, na.rm = na.rm)
}

moving_median <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = median, side = side, na.rm = na.rm)
}

moving_Q1 <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = quantile, side = side, na.rm = na.rm, 0.25)
}

moving_Q3 <- function(x, w = 5, side = "centre", na.rm = FALSE) {
  moving_fn(x = x, w = w, fun = quantile, side = side, na.rm = na.rm, 0.75)
}

score 1 · Answer 13 · answered Sep 11 '18 at 04:34

1

Though a bit slow but you can also use zoo::rollapply to perform calculations on matrices.

reqd_ma <- rollapply(x, FUN = mean, width = n)

where x is the data set, FUN = mean is the function; you can also change it to min, max, sd etc and width is the rolling window.

answered Sep 11 '18 at 04:34

Garima gulati

11
4

2

It is not slow;. Comparing it to base R, it is *much* faster. `set.seed(123); x <- rnorm(1000); system.time(apply(embed(x, 5), 1, mean)); library(zoo); system.time(rollapply(x, 5, mean))` On my machine it is so fast that it returns a time of 0 seconds. – G. Grothendieck Sep 12 '18 at 15:55

GoGonzo · Answer 14 · 2020-01-16T18:33:28.397

One can use runner package for moving functions. In this case mean_run function. Problem with cummean is that it doesn't handle NA values, but mean_run does. runner package also supports irregular time series and windows can depend on date:

library(runner)
set.seed(11)
x1 <- rnorm(15)
x2 <- sample(c(rep(NA,5), rnorm(15)), 15, replace = TRUE)
date <- Sys.Date() + cumsum(sample(1:3, 15, replace = TRUE))

mean_run(x1)
#>  [1] -0.5910311 -0.2822184 -0.6936633 -0.8609108 -0.4530308 -0.5332176
#>  [7] -0.2679571 -0.1563477 -0.1440561 -0.2300625 -0.2844599 -0.2897842
#> [13] -0.3858234 -0.3765192 -0.4280809

mean_run(x2, na_rm = TRUE)
#>  [1] -0.18760011 -0.09022066 -0.06543317  0.03906450 -0.12188853 -0.13873536
#>  [7] -0.13873536 -0.14571604 -0.12596067 -0.11116961 -0.09881996 -0.08871569
#> [13] -0.05194292 -0.04699909 -0.05704202

mean_run(x2, na_rm = FALSE )
#>  [1] -0.18760011 -0.09022066 -0.06543317  0.03906450 -0.12188853 -0.13873536
#>  [7]          NA          NA          NA          NA          NA          NA
#> [13]          NA          NA          NA

mean_run(x2, na_rm = TRUE, k = 4)
#>  [1] -0.18760011 -0.09022066 -0.06543317  0.03906450 -0.10546063 -0.16299272
#>  [7] -0.21203756 -0.39209010 -0.13274756 -0.05603811 -0.03894684  0.01103493
#> [13]  0.09609256  0.09738460  0.04740283

mean_run(x2, na_rm = TRUE, k = 4, idx = date)
#> [1] -0.187600111 -0.090220655 -0.004349696  0.168349653 -0.206571573 -0.494335093
#> [7] -0.222969541 -0.187600111 -0.087636571  0.009742884  0.009742884  0.012326968
#> [13]  0.182442234  0.125737145  0.059094786

One can also specify other options like lag, and roll only at specific indexes. More in package and function documentation.

score 1 · Answer 15 · answered Jul 19 '20 at 23:55

Here is a simple function with filter demonstrating one way to take care of beginning and ending NAs with padding, and computing a weighted average (supported by filter) using custom weights:

wma <- function(x) { 
  wts <- c(seq(0.5, 4, 0.5), seq(3.5, 0.5, -0.5))
  nside <- (length(wts)-1)/2
  # pad x with begin and end values for filter to avoid NAs
  xp <- c(rep(first(x), nside), x, rep(last(x), nside)) 
  z <- stats::filter(xp, wts/sum(wts), sides = 2) %>% as.vector 
  z[(nside+1):(nside+length(x))]
}

score 0 · Answer 16 · edited Jul 16 '20 at 12:27

0

vector_avg <- function(x){
  sum_x = 0
  for(i in 1:length(x)){
    if(!is.na(x[i]))
      sum_x = sum_x + x[i]
  }
  return(sum_x/length(x))
}

edited Jul 16 '20 at 12:27

Rui Barradas

70,273
8
34
66

answered Jul 16 '20 at 12:11

Mohamed Galia

1

2

Please add a description for further detail. – Farbod Ahmadian Jul 16 '20 at 12:52
1

Please relate your answer to the question and include some output which shows the question has been answered. See [answer] for guidance on making a good answer. – Peter Jul 16 '20 at 15:35

score 0 · Answer 17 · answered Dec 07 '20 at 15:17

I use aggregate along with a vector created by rep(). This has the advantage of using cbind() to aggregate more than 1 column in your dataframe at time. Below is an example of a moving average of 60 for a vector (v) of length 1000:

v=1:1000*0.002+rnorm(1000)
mrng=rep(1:round(length(v)/60+0.5), length.out=length(v), each=60)
aggregate(v~mrng, FUN=mean, na.rm=T)

Note the first argument in rep is to simply get enough unique values for the moving range, based on the length of the vector and the amount to be averaged; the second argument keeps the length equal to the vector length, and the last repeats the values of the first argument the same number of times as the averaging period.

In aggregate you could use several functions (median, max, min) - mean shown for example. Again, could could use a formula with cbind to do this on more than one (or all) columns in a dataframe.

score 0 · Answer 18 · answered Jul 14 '23 at 11:01

Another useful function if you want the two ends of series not to be NA but to be recursively calculated moving averages:

smoothing = function(x, k=1) {
  sapply(seq_along(x), function(i) {
    i.min = max(i-k, 1)
    i.max = min(i+k, length(x))
    mean(x[i.min:i.max], na.rm=TRUE)
  })
}

Example:

x = 1:10/2

[1] 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

smoothing(x, 2)

[1] 1.00 1.25 1.50 2.00 2.50 3.00 3.50 4.00 4.25 4.50

Calculating moving average

18 Answers18

Linked

Related