0

I would like to calculate monthly non-cumulative subtotals for my data frame (df).

  "date"    "id"   "change" 
2010-01-01    1       NA        
2010-01-07    2        3        
2010-01-15    2       -1        
2010-02-01    1       NA        
2010-02-04    2        7        
2010-02-22    2       -2        
2010-02-26    2        4        
2010-03-01    1       NA
2010-03-14    2       -4 
2010-04-01    1       NA      

A new period starts at the first day of a new month. The column "id" serves as a grouping variable for the beginning of a new period (==1) and observations within a period (==2). The goal is to sum up all changes within a month and then restart at 0 for the next period. The output should be stored in an additional column of df.

Here a reproducible example for my data frame:

require(dplyr)
require(tidyr)
require(lubridate)

date <- ymd(c("2010-01-01","2010-01-07","2010-01-15","2010-02-01","2010-02-04","2010-02-22","2010-02-26","2010-03-01","2010-03-14","2010-04-01"))   
df <- data.frame(date)
df$id <- as.numeric((c(1,2,2,1,2,2,2,1,2,1)))
df$change <- c(NA,3,-1,NA,7,-2,4,NA,-4,NA)

What i have tried to do:

df <- df %>%
group_by(id) %>%
mutate(total = cumsum(change)) %>%
ungroup() %>%
fill(total, .direction = "down") %>%
filter(id == 1)

Which leads to this output:

  "date"    "id"   "change"  "total"
 2010-01-01    1       NA        NA
 2010-02-01    1       NA        2
 2010-03-01    1       NA        11
 2010-04-01    1       NA        7

The problem lies with the function cumsum, which accumulates all the preceding values from a group and does not restart at 0 for a new period.

The desired output looks like this:

  "date"    "id"   "change"  "total"
2010-01-01    1       NA        NA
2010-02-01    1       NA        2
2010-03-01    1       NA        9
2010-04-01    1       NA       -4

The rows with "id" ==1 show the sum of changes for all preceding columns with "id" ==2, restarting at 0 for every period. Does there exist a specific command for this type of problem? Could anyone provide a corrected alternative to the code above?

Bevelmack
  • 21
  • 1
  • 3
  • 1
    Possible duplicate: https://stackoverflow.com/questions/52960348/conditional-running-count-cumulative-sum-with-reset-in-r-dplyr – MrFlick Apr 24 '19 at 17:13
  • Possible duplicate: https://stackoverflow.com/questions/49050891/r-cumulative-sum-with-a-condition-and-a-reset – MrFlick Apr 24 '19 at 17:14
  • Possible duplicate: https://stackoverflow.com/questions/8536529/r-resetting-a-cumsum-to-zero-at-the-start-of-each-year – MrFlick Apr 24 '19 at 17:14

1 Answers1

1

We may need to also use year-month formatted 'date' in the grouping variable to reset for each month

library(dplyr)
df %>%
  group_by(id, grp = format(date, "%Y-%m")) %>%
  mutate(total = cumsum(change)) %>%   
  ungroup() %>%
  fill(total, .direction = "down") %>%
  filter(id == 1) %>%
  ungroup %>%
  select(-grp)
# A tibble: 4 x 4
#  date          id change total
#  <date>     <dbl>  <dbl> <dbl>
#1 2010-01-01     1     NA    NA
#2 2010-02-01     1     NA     2
#3 2010-03-01     1     NA     9
#4 2010-04-01     1     NA    -4
akrun
  • 874,273
  • 37
  • 540
  • 662