0

i have a dataset that looks like this:

amount    rank    category
4000      1       A
200       3       A
1000      2       A
10        4       A
500       1       B
...

I want to calculate the cumulative sum of amount based on ordering of rank, i.e. return:

cum      rank    category
4000     1       A
5000     2       A
5200     3       A
5210     4       A
...

any help would be nice! :)

daikonradish
  • 682
  • 5
  • 14
  • 1
    possible duplicate of [summation of multiple columns grouped by multiple columns in R and output results as data frame](http://stackoverflow.com/questions/8212699/summation-of-multiple-columns-grouped-by-multiple-columns-in-r-and-output-result) – dave Apr 02 '14 at 22:56
  • 1
    doesn't really look related? – daikonradish Apr 02 '14 at 23:03
  • What you are asking for is a group by using two different columns. You are trying to sum, grouping by category and then rank. That question gives good answers on how to do it. – dave Apr 02 '14 at 23:09
  • @dave, I agree that it's indeed possible to get to the answer from the link, but similar answers aren't considered duplicates (to my knowledge). That being said, there may very well be exact duplicates of this question, as it is indeed a trivial one. – Arun Apr 02 '14 at 23:37
  • it's 'trivial' if you know the package that can give you the answer. – daikonradish Apr 02 '14 at 23:40
  • 1
    @daikonradish, by trivial, what I mean is that these sort of questions are *so* common on SO that searching would land up on quite a few exact, if not related hits. It just takes a bit of effort on your part. – Arun Apr 02 '14 at 23:49

2 Answers2

1

A data.table solution:

require(data.table) ## version >= 1.9.0
setDT(dat)          ## converts data.frame to data.table by reference

setkey(dat, category, rank) ## sort first by category, then by rank
dat[, csum := cumsum(amount), by=category]

#    amount rank category csum
# 1:   4000    1        A 4000
# 2:   1000    2        A 5000
# 3:    200    3        A 5200
# 4:     10    4        A 5210
# 5:    500    1        B  500
Arun
  • 116,683
  • 26
  • 284
  • 387
1

A dplyr solution:

library(dplyr)

data = data.frame(amount = c(4000, 200, 1000, 10, 500),
                  rank = c(1, 3, 2, 4, 1),
                  category = c("A", "A", "A", "A","B"))

data %>% arrange(category, rank) %>% 
 group_by(category) %>% mutate(csum = cumsum(amount))
Daniel Gardiner
  • 936
  • 8
  • 11