I would like to calculate monthly non-cumulative subtotals for my data frame (df).
"date" "id" "change"
2010-01-01 1 NA
2010-01-07 2 3
2010-01-15 2 -1
2010-02-01 1 NA
2010-02-04 2 7
2010-02-22 2 -2
2010-02-26 2 4
2010-03-01 1 NA
2010-03-14 2 -4
2010-04-01 1 NA
A new period starts at the first day of a new month. The column "id" serves as a grouping variable for the beginning of a new period (==1) and observations within a period (==2). The goal is to sum up all changes within a month and then restart at 0 for the next period. The output should be stored in an additional column of df.
Here a reproducible example for my data frame:
require(dplyr)
require(tidyr)
require(lubridate)
date <- ymd(c("2010-01-01","2010-01-07","2010-01-15","2010-02-01","2010-02-04","2010-02-22","2010-02-26","2010-03-01","2010-03-14","2010-04-01"))
df <- data.frame(date)
df$id <- as.numeric((c(1,2,2,1,2,2,2,1,2,1)))
df$change <- c(NA,3,-1,NA,7,-2,4,NA,-4,NA)
What i have tried to do:
df <- df %>%
group_by(id) %>%
mutate(total = cumsum(change)) %>%
ungroup() %>%
fill(total, .direction = "down") %>%
filter(id == 1)
Which leads to this output:
"date" "id" "change" "total"
2010-01-01 1 NA NA
2010-02-01 1 NA 2
2010-03-01 1 NA 11
2010-04-01 1 NA 7
The problem lies with the function cumsum, which accumulates all the preceding values from a group and does not restart at 0 for a new period.
The desired output looks like this:
"date" "id" "change" "total"
2010-01-01 1 NA NA
2010-02-01 1 NA 2
2010-03-01 1 NA 9
2010-04-01 1 NA -4
The rows with "id" ==1 show the sum of changes for all preceding columns with "id" ==2, restarting at 0 for every period. Does there exist a specific command for this type of problem? Could anyone provide a corrected alternative to the code above?