calculate cumulative sum of one column based on another column's rank

Question

i have a dataset that looks like this:

amount    rank    category
4000      1       A
200       3       A
1000      2       A
10        4       A
500       1       B
...

I want to calculate the cumulative sum of amount based on ordering of rank, i.e. return:

cum      rank    category
4000     1       A
5000     2       A
5200     3       A
5210     4       A
...

any help would be nice! :)

possible duplicate of [summation of multiple columns grouped by multiple columns in R and output results as data frame](http://stackoverflow.com/questions/8212699/summation-of-multiple-columns-grouped-by-multiple-columns-in-r-and-output-result) — dave, Apr 02 '14 at 22:56
What you are asking for is a group by using two different columns. You are trying to sum, grouping by category and then rank. That question gives good answers on how to do it. — dave, Apr 02 '14 at 23:09
@dave, I agree that it's indeed possible to get to the answer from the link, but similar answers aren't considered duplicates (to my knowledge). That being said, there may very well be exact duplicates of this question, as it is indeed a trivial one. — Arun, Apr 02 '14 at 23:37
it's 'trivial' if you know the package that can give you the answer. — daikonradish, Apr 02 '14 at 23:40
@daikonradish, by trivial, what I mean is that these sort of questions are *so* common on SO that searching would land up on quite a few exact, if not related hits. It just takes a bit of effort on your part. — Arun, Apr 02 '14 at 23:49

score 1 · Answer 1 · answered Apr 02 '14 at 23:01

A data.table solution:

require(data.table) ## version >= 1.9.0
setDT(dat)          ## converts data.frame to data.table by reference

setkey(dat, category, rank) ## sort first by category, then by rank
dat[, csum := cumsum(amount), by=category]

#    amount rank category csum
# 1:   4000    1        A 4000
# 2:   1000    2        A 5000
# 3:    200    3        A 5200
# 4:     10    4        A 5210
# 5:    500    1        B  500

score 1 · Answer 2 · answered Nov 23 '16 at 10:28

A dplyr solution:

library(dplyr)

data = data.frame(amount = c(4000, 200, 1000, 10, 500),
                  rank = c(1, 3, 2, 4, 1),
                  category = c("A", "A", "A", "A","B"))

data %>% arrange(category, rank) %>% 
 group_by(category) %>% mutate(csum = cumsum(amount))

calculate cumulative sum of one column based on another column's rank

2 Answers2

Linked