In r, I am using RcppRoll rollings sum function
I have a sequence stored in a column vector as
0 1 0 1 1 1 1 0 1 0 1 0
I want the rolling sum of 3 periods with a right align.
1, 2, 2, 3, 3, 2, 2 , 1, 2, 1, 1, 0
Instead I am getting
1, 2, 2, 3, 3, 2, 2 , 1, 2, 1, NA, NA
This is because once the window reaches the end of the sequence to only have 2 or 1 value, it does not sum.
Is this what partial is supposed to solve? How can I get the last n-1 periods to sum with a partial or collapsing window?
My current best idea is to add n values to the sequence so that the window is always full in the actual data and then remove the NA's post calculation.
That is slightly complicated in that these values are ordered by date so adding dates at the end requires some logical conditioning as the sum is being applied to dplyr groupby data.
Here is an example
library(RcppRoll)
c <- as.data.frame(c(0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0))
colnames(c) <- ("data")
c$sum <- roll_sum(c$data, n = 3, weights = NULL, fill = NA, align = "right", na.rm = TRUE, partial = TRUE)
This returns
NA NA 1 2 2 3 3 2 2 1 2 1
When
0 1 1 2 2 3 3 2 2 1 2 1
is desired.
This is how the data is nested in a groupby and could be dealt with once the NA issue is solved. I have the data sorted in desc order but the problem remains.
rolling_data <- rolling_svu %>%
group_by(TEAM, DATE) %>%
summarise(sumML = sum(LOAD)) %>%
complete(DATE = full_seq(GAME_DATE,1)) %>%
arrange(desc(DATE)) %>%
mutate(game_played = ifelse(is.na(sumML), 0, 1),
scheduled_next_7 = roll_sum(game_played, n = forward_window_period, weights = NULL, fill = NA, align = "right", na.rm = TRUE, partial = TRUE)) %>%
arrange(TEAM) %>%
filter(!is.na(sumML)) %>%
select(TEAM, DATE, scheduled_next_7 )
rolling_data$GAME_DATE <- as_date(rolling_sportvu_team$GAME_DATE)