cumulative sum of array of rows

Question

I have this data frame:

transaction ID	day number	Predicted value
12	1	.001
12	2	.002
12	1	.001
12	2	.002
13	1	.001
13	2	.002
13	3	.002
13	4	.003

I want to take the cumulative sum of the each set of predicted values based on the sequential day numbers (i.e. cumsum of the first 2 rows, cumsum of the next 2, and the cumsum of the last 4)

so the results would be .003, .003, .008

score 2 · Accepted Answer · answered Jan 19 '23 at 16:17

2

Using R base

sapply(split(df$Predicted_value,cumsum(c(1,diff(df$day_number)!=1))), sum)
   1     2     3 
0.003 0.003 0.008

answered Jan 19 '23 at 16:17

Jilber Urbina

58,147
10
114
138

1

Maybe using `tapply` instead of `split` and `sapply` is simpler. `tapply(df$Predicted_value, cumsum(c(1,diff(df$day_number)!=1)), sum)` – GKi Jan 19 '23 at 16:42

score 0 · Answer 2 · answered Jan 19 '23 at 16:17

Using the answer from this post:

df %>%
  group_by(transaction_ID) %>%
  mutate(id = cumsum(c(1, diff(day_number) != 1))) %>%
  group_by(transaction_ID, id) %>%
  summarise(result=sum(Predicted_value))%>%
  ungroup

  transaction_ID    id result
           <int> <dbl>  <dbl>
1             12     1  0.003
2             12     2  0.003
3             13     1  0.008

mattek · Answer 3 · 2023-01-19T16:39:41.373

Based on your desired output, it's not a cumulative sum but a sum by transaction ID and day group.

Using data.table

dat = data.table(transID = c(12,...),
                 dayNum = c(1,2,...),
                 predVal = c(0.001, 0.002, ...))

# introduce a grouping column; each group starts when day == 1
dat[, 
    gr := cumsum(dayNum == 1)]

# aggregate
dat[,
    sum(predVal),
    by = gr]

    gr    V1
1:  1 0.003
2:  2 0.003
3:  3 0.008

cumulative sum of array of rows

3 Answers3