I have a data table with 3 columns: customer_id, time_period and bought_cookies (0 if no, 1 if yes). I want to create a new column (total_number_cookie_buyers) that sums the previous rows of bought_cookies to see how many people have bought cookies up to that point in time (e.g. if the first three rows of bought_cookies were 0, 1, 0 then the value in the third row of total_number_cookie_buyers would be 1). I've tried googling but can't find anything on how to do this!
Asked
Active
Viewed 41 times
0
-
Please [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including some example data in a plain text format. – neilfws Jan 21 '20 at 03:29
-
Look at the ‘rollapply’ function in the ‘zoo’ package. This is a convenient and easy to use method for cumulative calculations over a data frame. – Dale Kube Jan 21 '20 at 03:35
-
Maybe you just need `cumsum` by group. If you have `data.table` something like this should work, `dt[, total_number_cookie_buyers := cumsum(bought_cookies), customer_id]` Or maybe just `dt[, total_number_cookie_buyers := cumsum(bought_cookies)]` – Ronak Shah Jan 21 '20 at 03:37
1 Answers
0
The approach you are looking for is called cumulative sum. I think it is the solution.
cust_id <- NULL
for(i in 1:21){
if(i<10){ k <- paste("ID_00",i,sep="") } else{
k <- paste("ID_0",i,sep="") }
cust_id[i] <- k
}
date <- sample(seq(as.Date('2020/01/01'), as.Date('2020/01/21'), by="day"), 21)
date <- date[order(date)]
sales <- rbinom(21,1,0.5)
df <- data.frame(cust_id=cust_id,date=date,sales=sales)
df$salesydate <- cumsum(df$sales)
cust_id date sales salesdate
1 ID_001 2020-01-01 0 0
2 ID_002 2020-01-02 0 0
3 ID_003 2020-01-03 0 0
4 ID_004 2020-01-04 1 1
5 ID_005 2020-01-05 1 2
6 ID_006 2020-01-06 0 2
7 ID_007 2020-01-07 1 3
...................................

badalovi
- 124
- 8
-
The solution is really only the final line of code `df$salesydate <- cumsum(df$sales)`, the rest is just to build an example data set. You should separate the two and label them so the OP, who is a beginner, knows which is which. – Jan 21 '20 at 06:43