I think zoo::rollmean
works well here, and dplyr::group_by
can handle as many index variables as you need:
library(dplyr)
mtcars %>%
group_by(cyl, am, vs) %>%
mutate(across(c(mpg,disp), list(rm = ~ zoo::rollmeanr(., 2, fill = NA))))
# # A tibble: 32 x 13
# # Groups: cyl, am, vs [7]
# mpg cyl disp hp drat wt qsec vs am gear carb mpg_rm disp_rm
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 NA NA
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 21 160
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 NA NA
# 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 NA NA
# 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 NA NA
# 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 19.8 242.
# 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 16.5 360
# 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 NA NA
# 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 23.6 144.
# 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 18.6 196.
# # ... with 22 more rows
The fill=NA
argument means that the first in each series has no history to average on, so it is NA
. If you prefer the first in a series to be an average of itself, you can instead use partial=TRUE
(using rollapplyr
instead):
mtcars %>%
group_by(cyl, am, vs) %>%
mutate(across(c(mpg,disp), list(rm = ~ zoo::rollapplyr(., 2, FUN = mean, partial = TRUE))))
# # A tibble: 32 x 13
# # Groups: cyl, am, vs [7]
# mpg cyl disp hp drat wt qsec vs am gear carb mpg_rm disp_rm
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 21 160
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 21 160
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 22.8 108
# 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 21.4 258
# 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 18.7 360
# 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 19.8 242.
# 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 16.5 360
# 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 24.4 147.
# 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 23.6 144.
# 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 18.6 196.
# # ... with 22 more rows
I've used the align="right"
variants of zoo's functions, assuming that your moving average is historical and that time increases in subsequent rows. If these assumptions are not true, make sure you intentionally choose between the align-variants.
I used dplyr::across
here to handle an arbitrary number of columns in one step: Since I used a named list of "tilde-functions", it took the name of each function and appended it to the name of each of the column names. You can break it out into individual mutate
assignments if you prefer, for readability, maintainability, or if you need different sets of arguments for each column.