Calculate cumulative sum (cumsum) by group

Question

With data frame:

df <- data.frame(id = rep(1:3, each = 5)
                 , hour = rep(1:5, 3)
                 , value = sample(1:15))

I want to add a cumulative sum column that matches the id:

df
   id hour value csum
1   1    1     7    7
2   1    2     9   16
3   1    3    15   31
4   1    4    11   42
5   1    5    14   56
6   2    1    10   10
7   2    2     2   12
8   2    3     5   17
9   2    4     6   23
10  2    5     4   27
11  3    1     1    1
12  3    2    13   14
13  3    3     8   22
14  3    4     3   25
15  3    5    12   37

How can I do this efficiently? Thanks!

IRTFM · Accepted Answer · 2019-08-30T01:06:25.123

63

df$csum <- ave(df$value, df$id, FUN=cumsum)

ave is the "go-to" function if you want a by-group vector of equal length to an existing vector and it can be computed from those sub vectors alone. If you need by-group processing based on multiple "parallel" values, the base strategy is do.call(rbind, by(dfrm, grp, FUN)).

edited Aug 30 '19 at 01:06

answered May 31 '13 at 05:17

IRTFM

258,963
21
364
487

Error in unique.default(x, nmax = nmax) : unique() applies only to vectors – Rock May 31 '13 at 05:19
1

I keep forgetting ... need to name the FUN argument. – IRTFM May 31 '13 at 05:19
4

Note that you can add additional `id` variables if multiple columns define each unique row. e.g., `df$csum <- ave(df$value, df$id1, df$id2, FUN=cumsum)`. – Brian D Nov 18 '16 at 19:06
@42- `plyr` was mothballed as of 2013 (six years ago already). You should be recommending `dplyr`/tidyverse/`data.table` – smci Aug 29 '19 at 20:23
@smci: Did you look at the date of the comment? Are you suggesting I go back through all my comments and update them? And that's not to mention the fact that I don't really like either `plyr` or `dplyr`, anyway. (And I did mention `data.table`.) So I decided to just delete the comment and put the useful stuff in the answer. – IRTFM Aug 30 '19 at 01:07
@42- No, only important old ones that are both out-of-date and likely to be used as a close target for other questions. Thanks for updating the answer. Still, I never see `ave` used, these days it's all `dplyr` or `data.table`. (Why do you dislike them?) and could you edit your answer to list the latter two first? – smci Aug 30 '19 at 11:52
1

I didn't say I disliked `data.table`. I said I had mentioned it in the now deleted comment. I voted up A5C1D2H2I1M1N2O1R2T1's solution which I will admit is very clear – IRTFM Aug 30 '19 at 14:15

A5C1D2H2I1M1N2O1R2T1 · Answer 2 · 2017-12-25T12:14:57.190

To add to the alternatives, data.table's syntax is nice:

library(data.table)
DT <- data.table(df, key = "id")
DT[, csum := cumsum(value), by = key(DT)]

Or, more compactly:

library(data.table)
setDT(df)[, csum := cumsum(value), id][]

The above will:

Convert the data.frame to a data.table by reference
Calculate the cumulative sum of value grouped by id and assign it by reference
Print (the last [] there) the result of the entire operation

"df" will now be a data.table with a "csum" column.

score 25 · Answer 3 · edited Jun 25 '18 at 20:00

25

Using dplyr::

require(dplyr)
df %>% group_by(id) %>% mutate(csum = cumsum(value))

edited Jun 25 '18 at 20:00

Henrik

65,555
14
143
159

answered Nov 13 '17 at 13:41

tjebo

21,977
7
58
94

3

Hey, I tried your method. Somehow the grouping is not working properly. It does cumsum for all the data points without grouping. any suggestions? – Kathiravan Meeran Nov 15 '18 at 15:32
sometimes starting a fresh r session helps in those cases. try my code on the sample data. – tjebo Nov 15 '18 at 15:36
3

Just an update, you might have a package that has loaded `plyr`. Explicitly referencing `dplyr` will fix it also: ``` df %>% group_by(id) %>% dplyr::mutate(csum = cumsum(value)) ``` – user3602585 Apr 10 '19 at 00:32
my explicit reference to dplyr does not yield an accurate by-group result; using `ave()` did – Ben Jun 30 '23 at 15:47

score 8 · Answer 4 · answered May 31 '13 at 05:19

8

Using library plyr.

library(plyr)
ddply(df,.(id),transform,csum=cumsum(value))

answered May 31 '13 at 05:19

Didzis Elferts

95,661
14
264
201

score 1 · Answer 5 · answered Jun 04 '22 at 23:48

Using base R

df <- data.frame(id = rep(1:3, each = 5)
                 , hour = rep(1:5, 3)
                 , value = sample(1:15))

transform(df , csum = ave(value , id , FUN = cumsum))
#>    id hour value csum
#> 1   1    1     4    4
#> 2   1    2    12   16
#> 3   1    3    13   29
#> 4   1    4     6   35
#> 5   1    5     5   40
#> 6   2    1    15   15
#> 7   2    2     1   16
#> 8   2    3     2   18
#> 9   2    4     8   26
#> 10  2    5     9   35
#> 11  3    1    11   11
#> 12  3    2     7   18
#> 13  3    3    10   28
#> 14  3    4     3   31
#> 15  3    5    14   45

^{Created on 2022-06-05 by the reprex package (v2.0.1)}

Calculate cumulative sum (cumsum) by group

5 Answers5

Linked

Related