1

I have a data frame like this:

data=data.frame(ID=c("0001","0002","0003","0004","0004","0004","0001","0001","0002","0003"),Saldo=c(10,10,10,15,20,50,100,80,10,10),place=c("grocery","market","market","cars","market","market","cars","grocery","cars","cars"))

I was trying to calculate total sum of aldo for each individual in ID variable applying cumsum or apply but I don't get the result I want. I would like someone like this:

  ID      Saldo.Total
1 0001         190
2 0002          20
3 0003          20
4 0004          85 
mnel
  • 113,303
  • 27
  • 265
  • 254
Duck
  • 39,058
  • 13
  • 42
  • 84
  • This is close to the most FAQ on stack overflow. http://stackoverflow.com/questions/3505701/r-grouping-functions-sapply-vs-lapply-vs-apply-vs-tapply-vs-by-vs-aggrega/7141669#7141669 is the canonical question and answer, but a search of `[r] group by` or `[r] aggregate` would get you close. Your example appears to show the total, not the cumulative sum.\ – mnel Mar 14 '13 at 02:34
  • If either of the answers below solved your problem, please mark one add accepted. – Ari B. Friedman Mar 23 '13 at 03:35
  • 2
    I really don't think that *every* split-apply_some_fun-combine question needs to get closed as a duplicate of the canonical question. That isn't very helpful as it isn't specific enough, especially for new users. – Gavin Simpson Mar 31 '13 at 16:25

2 Answers2

5

You can use aggregate:

> aggregate(Saldo ~ ID, data, function(x) max(cumsum(x))) ## same as sum
    ID Saldo
1 0001   190
2 0002    20
3 0003    20
4 0004    85

If you're really interested in a cumulative sum by ID, try the following:

within(data, {
  Saldo.Total <- ave(Saldo, ID, FUN = cumsum)
})
#     ID Saldo   place Saldo.Total
# 1  0001    10 grocery          10
# 2  0002    10  market          10
# 3  0003    10  market          10
# 4  0004    15    cars          15
# 5  0004    20  market          35
# 6  0004    50  market          85
# 7  0001   100    cars         110
# 8  0001    80 grocery         190
# 9  0002    10    cars          20
# 10 0003    10    cars          20
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
1

I think you may have gotten confused, as what you want is not really a cumulative sum, it's just a sum:

library(plyr)
ddply(
  data,
  .(ID),
  summarize,
  Saldo.Total=sum(Saldo)
  )

Output:

    ID Saldo.Total
1 0001         190
2 0002          20
3 0003          20
4 0004          85

A cumulative sum is the "running total" as you move along the vector, e.g.:

> x = c(1, 2, 3, 4, 5)
> cumsum(x)
[1]  1  3  6 10 15
Marius
  • 58,213
  • 16
  • 107
  • 105