0

I have a data table with 3 columns: customer_id, time_period and bought_cookies (0 if no, 1 if yes). I want to create a new column (total_number_cookie_buyers) that sums the previous rows of bought_cookies to see how many people have bought cookies up to that point in time (e.g. if the first three rows of bought_cookies were 0, 1, 0 then the value in the third row of total_number_cookie_buyers would be 1). I've tried googling but can't find anything on how to do this!

Jess
  • 1
  • 1
  • Please [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including some example data in a plain text format. – neilfws Jan 21 '20 at 03:29
  • Look at the ‘rollapply’ function in the ‘zoo’ package. This is a convenient and easy to use method for cumulative calculations over a data frame. – Dale Kube Jan 21 '20 at 03:35
  • Maybe you just need `cumsum` by group. If you have `data.table` something like this should work, `dt[, total_number_cookie_buyers := cumsum(bought_cookies), customer_id]` Or maybe just `dt[, total_number_cookie_buyers := cumsum(bought_cookies)]` – Ronak Shah Jan 21 '20 at 03:37

1 Answers1

0

The approach you are looking for is called cumulative sum. I think it is the solution.

cust_id <- NULL
 for(i in 1:21){
  if(i<10){ k <- paste("ID_00",i,sep="") } else{
    k <- paste("ID_0",i,sep="") }
  cust_id[i] <- k 
}
date <- sample(seq(as.Date('2020/01/01'), as.Date('2020/01/21'), by="day"), 21)
date <- date[order(date)]
sales <- rbinom(21,1,0.5)
df <- data.frame(cust_id=cust_id,date=date,sales=sales)
df$salesydate <- cumsum(df$sales)

cust_id date sales salesdate

1 ID_001 2020-01-01 0 0

2 ID_002 2020-01-02 0 0

3 ID_003 2020-01-03 0 0

4 ID_004 2020-01-04 1 1

5 ID_005 2020-01-05 1 2

6 ID_006 2020-01-06 0 2

7 ID_007 2020-01-07 1 3

...................................

badalovi
  • 124
  • 8
  • The solution is really only the final line of code `df$salesydate <- cumsum(df$sales)`, the rest is just to build an example data set. You should separate the two and label them so the OP, who is a beginner, knows which is which. –  Jan 21 '20 at 06:43