I am currently working with a data set in R that looks somewhat like the following (except with millions of pids and observations):
id agedays diar
1 1 1
1 2 0
1 3 1
1 4 1
1 5 0
1 6 0
1 7 NA
1 8 1
1 9 1
1 10 1
3 2 0
3 5 0
3 6 0
3 8 1
3 9 1
4 1 0
4 4 NA
4 5 0
4 6 1
4 7 0
I need to create a rolling sum on diar based on increments of agedays values. I want to create a variable that will hold the sum of diar 5 days back for each row of data. The variable will be called diar_prev5. The data set should look like the following :
id agedays diar diar_prev5
1 1 1 NA
1 2 0 NA
1 3 1 NA
1 4 1 NA
1 5 0 3
1 6 0 2
1 7 NA 2
1 8 1 2
1 9 1 2
1 10 1 3
3 2 0 NA
3 5 0 0
3 6 0 0
3 8 1 1
3 9 1 2
4 1 0 NA
4 4 NA NA
4 5 0 0
4 6 1 1
4 7 0 1
As shown above, the rolling sum should include the current agedays value and if some values in between the current row and the 4 days back contain NA values, the rolling sum should ignore these and still count the obs. in between (if there are any). I had tried both roll_sum and rollsum functions to achieve this request, but found that the function did not work if the agedays column contained gaps. When gaps occurred, the rolling sum would just contain an NA value as opposed to calculating the values in between the gaps. The functions also don't seem to include the present value of agedays in the rolling sum calculation, so I previously had to go back in and manually add this.
The previous code I used pertaining to roll_sum that did not work is seen below:
DT[, diar_prev5 := roll_sum(lag(diar, 1L), n=4L, fill=NA, align = "right"), by=id]
My question now, is how can I create a custom function to achieve the above that will include the current value of diar in the calculation and won't have issues with gaps on agedays values?
I've tried the following- but the variable results with only 0's and doesn't seem to work properly:
f = function(id_input, ageday_input) {
startday = ageday_input
endday = ageday_input- 13
sum((MPC_anthro %>% filter(id == id_input & agedays <= startday & startday <= endday))$diar) }
f = Vectorize(f)
MPC_anthro_1<-MPC_anthro %>% mutate(diar_prev5 = f(id, agedays))