moving average by secondary factor in r

Question

Im using this as an example (Calculating moving average) which I have successfully incorporated into my code. I need to calculate rolling mean and rolling median (which I have done) but my data set is enormous and I need to add a secondary variable to filter this by. In the example below, they calculate rolling mean for a data set of 10 days. What happens if they have 10 days for different locations, and we need to calculate the rolling means for 10 days based on these different location?

library(tidyverse)
library(zoo)

some_data = tibble(day = 1:10)
# cma = centered moving average
# tma = trailing moving average
some_data = some_data %>%
mutate(roll_mean = rollmean(day, k = 3, fill = NA)) %>%
mutate(roll_median = rollmedian(day, k = 3, fill = NA, align = "right"))
some_data

Have you tried `some_data %>% group_by(grouping_variable) %>% mutate(` etc? — Allan Cameron, Jun 14 '20 at 09:39

Waldi · Accepted Answer · 2020-06-14T11:44:31.793

2

You can group by location :

library(tidyverse)
library(zoo)

some_data <- rbind(tibble(day = 1:5,location = c(rep("A",5))),
                   tibble(day = 1:5,location = c(rep("B",5))))

some_data <- some_data %>% group_by(location) %>%
  mutate(roll_mean_left = rollmean(day, k = 3, fill = NA, align='left'),
         roll_mean_right = rollmean(day, k = 3, fill = NA, align='center'),
         roll_median_center = rollmedian(day, k = 3, fill = NA, align = 'right'))

some_data

The roll function reinitializes for each location.
Note how the rolling window moves according to the align parameter:

     day location roll_mean_left roll_mean_right roll_median_center
   <int> <chr>             <dbl>           <dbl>              <dbl>
 1     1 A                     2              NA                 NA
 2     2 A                     3               2                 NA
 3     3 A                     4               3                  2
 4     4 A                    NA               4                  3
 5     5 A                    NA              NA                  4
 6     1 B                     2              NA                 NA
 7     2 B                     3               2                 NA
 8     3 B                     4               3                  2
 9     4 B                    NA               4                  3
10     5 B                    NA              NA                  4

edited Jun 14 '20 at 11:44

answered Jun 14 '20 at 09:40

Waldi

39,242
6
30
78

Thanks! What does align="right" do? Does it need to be there and is it specific to rollmedian? I removed it in my code and I don't know if it makes a difference. Also when I did an ANOVA for day by location its not significantly different but roll_mean (of day) by location is significantly different. Im trying to understand why it would be different. Wouldn't roll mean just average the 3 values so it shouldn't be significantly different.. – L55 Jun 14 '20 at 09:43
it tells you whether the rolling window is aligned on the right or on the left. When it is aligned on the right the 2 first result rows are NA (because you need 3 rows to begin your calculation) – Waldi Jun 14 '20 at 09:48
default align value for rollmean is 'center', but you can test 'left' or right to see the difference – Waldi Jun 14 '20 at 10:06
1

This post https://stackoverflow.com/questions/61777516/how-to-calculate-7-day-moving-average-in-r/61777855#61777855 has a discussion of `align=`. Note that the various roll* functions can be suffixed with an r on the end, e.g. `rollmeanr`, in which case they default to `align = "right"`. – G. Grothendieck Jun 14 '20 at 11:53

score 0 · Answer 2 · answered May 31 '23 at 15:41

Small note:

mutate(roll_mean_left = rollmean(day, k = 3, fill = NA, align='left'),
     roll_mean_right = rollmean(day, k = 3, fill = NA, align='center'),
     roll_median_center = rollmedian(day, k = 3, fill = NA, align = 'right'))

Is a bit misleading, I think it was meant to be

mutate(roll_mean_left = rollmean(day, k = 3, fill = NA, align='left'),
     roll_mean_right = rollmean(day, k = 3, fill = NA, align='right'),
     roll_median_center = rollmedian(day, k = 3, fill = NA, align = 'center'))

Note change of the "align=" clauses to match the variable names.

moving average by secondary factor in r

2 Answers2