1

Here is my data :

class x1  x2
c     6   90
b     5   50
c     3   70
b     9   40
a     5   30
b     1   60
a     7   20
c     4   80
a     2   10

I first want to order it by class (increasing or decreasing doesn't really matter) and then by x1 (decreasing), so I do the following :

df <- df[with(df, order(class, x1, decreasing = TRUE))]

class x1  x2
c     6   90
c     4   80
c     3   70
b     9   40
b     5   50
b     1   60
a     7   20
a     5   30
a     2   10

And then I would like the cumulative sum over x1 for each class :

class x1  x2  cumsum  
c     6   90  90      
c     4   80  170     # 90+80
c     3   70  240     # 90+80+70
b     9   40  40      
b     5   50  90      # 40+50
b     1   60  150     # 40+50+60
a     7   20  20
a     5   30  50      # 20+30
a     2   10  60      # 20+30+10

Following this answer, I did this :

df$cumsum <- unlist(by(df$x2, df$class, cumsum))
# (Also tried this, same result)
df$cumsum <- unlist(by(df[,x2], df[,class], cumsum))

But what I get is a cumulative sum over the whole set + misordered. To be more specific, Here is what I get :

class x1  x2  cumsum  
c     6   90  20   # this cumsum
c     4   80  50   # and this cumsum
c     3   70  60   # and this cumsum are the cumsum of the lines of class a,
b     9   40  100  # then it adds the 'x2' values of class b : 60 ('cumsum' from the previous line) + 40   
b     5   50  150  # and keeps doing so : 100 + 50
b     1   60  210  # 150 + 60
a     7   20  300  # 210 + 90
a     5   30  380  # 300 + 80
a     2   10  450  # 380 + 70

Any idea on how I could solve this ? Thanks

Community
  • 1
  • 1
François M.
  • 4,027
  • 11
  • 30
  • 81

3 Answers3

3

dplyr can work here too

library(dplyr)

df %>%  
  group_by(class) %>%  
  arrange(desc(x1)) %>%
  mutate(cumsum=cumsum(x2))

##    class    x1    x2 cumsum
##   (fctr) (int) (int)  (int)
## 1      a     7    20     20
## 2      a     5    30     50
## 3      a     2    10     60
## 4      b     9    40     40
## 5      b     5    50     90
## 6      b     1    60    150
## 7      c     6    90     90
## 8      c     4    80    170
## 9      c     3    70    240

As described here (https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html) and elsewhere, the group_by in conjunction with arrange implies that the data will be sorted by the grouping variable first.

joemienko
  • 2,220
  • 18
  • 27
2

We can use data.table

library(data.table)
setDT(df)[, x2:= cumsum(x2) , class]
df
#   class x1  x2
#1:     c  6  90
#2:     c  4 170
#3:     c  3 240
#4:     b  9  40
#5:     b  5  90
#6:     b  1 150
#7:     a  7  20
#8:     a  5  50
#9:     a  2  60

NOTE: In the above I used the ordered data If we need to order also,

setorder(setDT(df), -class, -x1)[, x2:=cumsum(x2), class]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Or `setDT(df)[order(-class, -x1), x2 := cumsum(x2), by = class]` which will retain the original order of `df` too. – Arun May 18 '16 at 13:03
  • 1
    @Arun Thanks, I was thinking in that way, but later I thought the OP wanted to rearrange the rows. – akrun May 18 '16 at 13:12
0

You can use base R transform and ave to cumsum over the class column

transform(df[order(df$class, decreasing = T), ], cumsum = ave(x2, class, FUN=cumsum))


#  class x1 x2 cumsum
#1     c  6 90     90
#3     c  3 70    160
#8     c  4 80    240
#2     b  5 50     50
#4     b  9 40     90
#6     b  1 60    150
#5     a  5 30     30
#7     a  7 20     50
#9     a  2 10     60
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213