R noob (still) here, working in tidyverse
/ RStudio.
I have a tidy dataset where each row has a date, a grouping characteristic, and a value (actual dataset more complicated but that's the core of it):
I group the data by Group
for each Date
, and calculate some summary stats of the Value
, yielding a by-group summary for each date. For instance:
grouped <- data %>% group_by(Date, Group) %>% summarise(mean = mean(Value))
head(grouped)
# A tibble: 6 × 3
# Groups: Date [4]
Date Group mean
<date> <fct> <dbl>
1 2021-02-18 A 37.4
2 2021-02-19 B 25.5
3 2021-02-19 A 26.1
4 2021-02-22 B 34.2
5 2021-02-22 A 26.4
6 2021-02-23 B 34.2
(Note: data is below for reproducibility.)
So far so good. Now I want to take the moving average of those summary stats (mean
in this case, but could be others) by Group
. I tried this with zoo::rollmean
:
grouped <- grouped %>%
group_by(Group) %>%
mutate(rolling = zoo::rollmean(mean, window_length, fill=NA))
But here a problem arises - ideally, the moving average should be strictly some number of days, not records, but there are some days missing for one or both groups.
What's the best way to ensure that the moving average correctly takes into account the missing days x groups, treating them as NA
as needed?
(I understand from this answer that zoo::rollmean
wouldn't be able to handle NA
values, but zoo::rollapply
should be able to.)
I have tried creating a simple calendar dataframe with the full set of dates to join
the grouped data to, but that leaves the Group
variable as NA
as well, so the missing days x groups are still ignored by the rollmean / rollapply
function.
Hope that all makes sense!
data <- structure(list(Date = structure(c(18676, 18677, 18677, 18680,
18680, 18680, 18680, 18680, 18680, 18680, 18680, 18680, 18680,
18680, 18680, 18681, 18681, 18681, 18681, 18681, 18681, 18681,
18681, 18681, 18681, 18681, 18681, 18681, 18681, 18681, 18681,
18681, 18681, 18681, 18682, 18682, 18682, 18682, 18682, 18683,
18683, 18683, 18683, 18683, 18683, 18683, 18683, 18683, 18683,
18683, 18683, 18683, 18684, 18684, 18684, 18684, 18684, 18684,
18684, 18684, 18684, 18684, 18684, 18685, 18685, 18685, 18685,
18685, 18685, 18685, 18685, 18685, 18685, 18685, 18687, 18687,
18687, 18687, 18687, 18687, 18687, 18687, 18687, 18688, 18688,
18688, 18688, 18688, 18688, 18688, 18688, 18688, 18689, 18689,
18689, 18689, 18689, 18689, 18690, 18690, 18690, 18690, 18690,
18690, 18690, 18690, 18691, 18691, 18691, 18691, 18691, 18691,
18691, 18691, 18691, 18691, 18692, 18692, 18692, 18692, 18692,
18692, 18692, 18692, 18692, 18692, 18692, 18692, 18693, 18694,
18694, 18694, 18694, 18694, 18694, 18694, 18694, 18694, 18694,
18694, 18694, 18695, 18695, 18695, 18695, 18695, 18695, 18695,
18695, 18695, 18696, 18696, 18696, 18696, 18696, 18696, 18696,
18696, 18696, 18697, 18697, 18697, 18697, 18697, 18697, 18697,
18697, 18697, 18698, 18698, 18698, 18698, 18698, 18698, 18698,
18698, 18698, 18699, 18699, 18699, 18699, 18699, 18699, 18699,
18699, 18699, 18699, 18699, 18699, 18699, 18699, 18699, 18699,
18699, 18699, 18699, 18700, 18701, 18701, 18701, 18701, 18701,
18701, 18701, 18701, 18701, 18701, 18701, 18701, 18701, 18701,
18701, 18702, 18702, 18702, 18702, 18702, 18702, 18702, 18702,
18702, 18702, 18702, 18702, 18702, 18702, 18702, 18702, 18702,
18702, 18702, 18703, 18703, 18703, 18703, 18703, 18703, 18703,
18703, 18703, 18703, 18703, 18703, 18703, 18703, 18703, 18703,
18703, 18703, 18703), class = "Date"), Group = structure(c(2L,
2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L,
2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L,
2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L,
2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L,
1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L,
1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L,
1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("B", "A"), class = "factor"),
Value = c(37.43, 26.13, 25.54, 31.65, 26.95, 15.29, 35.93,
28.59, 17.14, 30.42, 20.52, 33.4, 35.3, 36.87, 28.32, 21.78,
25.49, 34.13, 20.35, 40.21, 16, 24.58, 23.61, 38.94, 36.76,
29.68, 15.97, 20.79, 17.83, 14.65, 16.76, 35.74, 31.5, 25.6,
32.96, 14.1, 40.4, 24.53, 39.57, 21.38, 14.49, 22.11, 27.12,
16.46, 17.65, 37.32, 15.74, 17.07, 28.52, 14.72, 27.75, 36.69,
39.47, 26.13, 35.57, 24.08, 24.39, 13.1, 16.75, 24.49, 23.61,
15.04, 23.22, 37.3, 36.76, 15.77, 28.34, 35.06, 28.32, 29.39,
19.09, 35.68, 35.9, 37.13, 36.1, 40.55, 33.97, 24.03, 37.25,
34.39, 13.05, 21.64, 40.02, 26.17, 19.39, 25.76, 40.92, 24.21,
20.35, 27.7, 29.53, 14.19, 15.64, 32.74, 31.42, 14.01, 12.85,
17.31, 31.67, 23.63, 17.29, 36.71, 18.19, 17.78, 34.87, 36.87,
19.27, 24.97, 41.66, 16.83, 34.79, 14.94, 34.39, 40.66, 31.35,
31.74, 36.19, 18.28, 37.61, 37.19, 29.58, 17.04, 28.84, 16.6,
41.97, 32.36, 27.91, 21.7, 40.45, 35.38, 41.19, 35.68, 19.49,
20.94, 23.99, 14.28, 39.24, 12.19, 18.02, 39.14, 40.61, 33.32,
38.68, 39.18, 31.76, 22.64, 38.18, 36.75, 30.91, 38.82, 30.68,
14.2, 39.34, 18.91, 12.7, 28.2, 37.85, 34.06, 12.88, 40.03,
29.95, 14.61, 17.01, 35.64, 20.49, 39.51, 29.29, 18.84, 36.42,
37.88, 32.65, 19.7, 19.84, 38.75, 21.25, 40.68, 17.89, 26.3,
37.22, 18.03, 17.33, 36.26, 41.98, 19.4, 20.54, 18.6, 26.92,
15.23, 20.22, 15.2, 35.49, 15.14, 14.43, 30.82, 14.79, 17.74,
36.8, 17.09, 18.09, 19.92, 39.64, 23.87, 22.67, 24.66, 24.33,
16.82, 17.91, 21.66, 30.79, 32.91, 25.16, 38.98, 15.49, 21.33,
38.47, 34.46, 24.22, 36.93, 22.25, 15.33, 41.38, 34.49, 23.44,
30.53, 10.62, 23.8, 28.94, 12.49, 22, 24.51, 14.72, 15.53,
23.23, 38.93, 16.06, 19.36, 35.91, 22.2, 15.85, 33.36, 31.75,
19.69, 29.86, 16.3, 19.42, 19.17, 14.41, 13.18, 20.67, 17.02
)), row.names = c(NA, -250L), class = c("tbl_df", "tbl",
"data.frame"))