0

I have this example dataset:

    df <- data.frame(ID = c(1, 1, 1, 2, 2, 2), A = c("2018-10-12",
          "2018-10-12", "2018-10-13", "2018-10-14", "2018-10-15", "2018-10-16"), 
          B = c(1, 5, 7, 2, 54, 202))
  ID       A      B
1  1 2018-10-12   1
2  1 2018-10-12   5
3  1 2018-10-13   7
4  2 2018-10-14   2
5  2 2018-10-15  54
6  2 2018-10-16 202

What I'm trying to do is create a column C that is the sum of B but only for dates before each respective row. For instance, the output I'm seeking is:

  ID       A      B   C
1  1 2018-10-12   1   1
2  1 2018-10-12   5   6
3  1 2018-10-13   7   13
4  2 2018-10-14   2   2
5  2 2018-10-15  54   56
6  2 2018-10-16 202   258

I generally will use subsets to do individual sumifs when I have those questions, but I'm not sure how to do this in a new column.

My end goal is to determine the dates that each ID (if applicable) crosses 50.

Thanks!

deslaur
  • 43
  • 3

1 Answers1

1

We can do a group by cumulative sum to create the 'C' column

library(dplyr)
df %>% 
  group_by(ID) %>% 
  mutate(C = cumsum(B))

Or use data.table

library(data.table)
setDT(df)[, C := cumsum(B), by = ID]

or with base R

df$C <- with(df, ave(B, ID, FUN = cumsum))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks! This seems to work. I used the base R method, however it seems that I need to have the data sorted by investment date for it to work correctly, right? – deslaur Nov 16 '18 at 19:55
  • @deslaur You can order by `investmentdate` before and then apply the `cumsum` on the `order`ed data – akrun Nov 16 '18 at 19:56