17

So I have a dataset which simplified looks something like this:

Year    ID     Sum
2009    999    100
2009    123     85
2009    666    100
2009    999    100
2009    123     90
2009    666     85
2010    999    100
2010    123    100
2010    666     95
2010    999     75
2010    123    100
2010    666     85

I'd like to add a column with the cumulative sum, by year and ID. Like this:

Year     ID     Sum    Cum.Sum
2009    999     100        100
2009    123      85         85  
2009    666     100        100  
2009    999     100        200
2009    123      90        175
2009    666      85        185
2010    999     100        100
2010    123     100        100
2010    666      95         95
2010    999      75        175
2010    123     100        200
2010    666      85        180

I think this should be pretty straight-forward, but somehow I haven't been able to figure it out. How do I do this? Thanks for the help!

Alex T
  • 343
  • 1
  • 4
  • 9
  • 9
    ... Can you share what you've tried and where you have looked? Take a look to the right at the "related" section or try googling for `R cumulative sum`. In the future, please do some research first before posting here. – Justin Sep 20 '13 at 20:51

3 Answers3

42

Using data.table:

require(data.table)
DT <- data.table(DF)
DT[, Cum.Sum := cumsum(Sum), by=list(Year, ID)]

    Year  ID Sum Cum.Sum
 1: 2009 999 100     100
 2: 2009 123  85      85
 3: 2009 666 100     100
 4: 2009 999 100     200
 5: 2009 123  90     175
 6: 2009 666  85     185
 7: 2010 999 100     100
 8: 2010 123 100     100
 9: 2010 666  95      95
10: 2010 999  75     175
11: 2010 123 100     200
12: 2010 666  85     180
Arun
  • 116,683
  • 26
  • 284
  • 387
1

Another way

1) use ddply to sum a variable by group (similar to SQL group by)

X <- ddply ( dataset, .(Year,ID), sum)

2) merge the result with dataset

Y <- merge( dataset, X, by=('Year','ID')
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
holgrich
  • 351
  • 6
  • 23
0

You can use dplyr, and the base function cumsum:

require(dplyr)    

dataset %>% 
  group_by(Year, ID) %>%
  mutate(cumsum = cumsum(Sum)) %>%
  ungroup()