7

Suppose data looks like

group1 group2 num
A      sg     1
A      sh     2
A      sg     4
B      at     3
B      al     7

a <- cumsum(data[,"num"]) # 1 3 7 10 17

I need something accumulated by groups. In reality,I have multiple columns as grouping indicators. I want to get the accumulated sum by the subgroup I define.

E.g

If I group by group1 only, then the output should be

group1 sum
A      1
A      3
A      7
B      3
B      10

If I group by two variables group1,group2 then the output is

group1 group2 sum
A      sg     1
A      sh     2
A      sg     5
B      at     3
B      al     7
lmo
  • 37,904
  • 9
  • 56
  • 69
Lovnlust
  • 1,507
  • 3
  • 29
  • 53
  • 10
    Try for instance: `ave(df$num,df$group1,FUN=cumsum)` for summing on just `group1` or `ave(df$num,df$group1,df$group2,FUN=cumsum)` for both groups. – nicola May 16 '15 at 14:52
  • 7
    This should be basic group by operation. Using `data.table` `setDT(data)[, cumsum(num), list(group1, group2)]` or `data %>% group_by(group1, group2) %>% mutate(sum=cumsum(num))` with `dplyr` – akrun May 16 '15 at 14:55
  • @nicola didn't notice `ave` has option for `FUN`! – Lovnlust May 16 '15 at 15:00
  • @akrun. I haven't heard of this package. Will check it out later. The syntax looks weird. – Lovnlust May 16 '15 at 15:00
  • @W_ee which one the `data.table` or `dplyr`. `dplyr` is created to make things easier to understand. – akrun May 16 '15 at 15:02
  • @akrun `dplyr`, Well, its name is already difficult to understand. Haven't found out what `dplyr` stands for. – Lovnlust May 16 '15 at 15:06
  • @W_ee I read in some posts that it acts like the `plier` – akrun May 16 '15 at 16:37
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/77997/discussion-between-w-ee-and-akrun). – Lovnlust May 17 '15 at 05:53

2 Answers2

9
library(data.table)

data <- data.table(group1=c('A','A','A','B','B'),sum=c(1,2,4,3,7))

data[,list(cumsum = cumsum(sum)),by=list(group1)]
Quinn Weber
  • 927
  • 5
  • 10
6

In addition to using data.table, tapply in base R works fine for both of these cases:

dta <- read.table(text="
group1 group2 num
A      sg     1
A      sh     2
A      sg     4
B      at     3
B      al     7", header=TRUE)

dta$cumsum <- do.call(c, tapply(dta$num, dta$group1, FUN=cumsum))

Calculating the cumulative sum by two groups requires some reordering:

dta <- dta[order(dta$group1, dta$group2, dta$num),]

dta$cumsum2 <- do.call(c, tapply(dta$num, 
                                 paste0(dta$group1, dta$group2), 
                                 FUN=cumsum))
dta
      group1 group2 num cumsum cumsum2
1      A     sg   1      1       1
3      A     sg   4      7       5
2      A     sh   2      3       2
5      B     al   7     10       7
4      B     at   3      3       3

And if you need the original order back:

dta[as.numeric(rownames(dta)),]
  group1 group2 num cumsum cumsum2
1      A     sg   1      1       1
2      A     sh   2      3       2
3      A     sg   4      7       5
4      B     at   3      3       3
5      B     al   7     10       7
Jason Morgan
  • 2,260
  • 21
  • 24
  • Nice! I would just add `simplify = FALSE` to that `tapply()` call for the sake of type safety. – landau Feb 26 '19 at 16:10