How to get cumulative sum based on a condition

Question

I have a dataset with millions of values with 2 columns(ID, Amount). Amount is sorted in descending order. I need to get cumulative sum of amount based on a condition.

ID       Amount
101      40000
102      20000
103      15000
104      10000
......

For Example if there are 1000 rows I need the cumulative sum of first 1% i.e first 10 rows after sorting, then 4% (40), 15% (150), 35%(350) and below 50% (500).

How do I get this in R

@RafaelPereira I need the `cumsum` for entire base and not for each `ID` . — mockash, May 19 '16 at 16:11

score 1 · Accepted Answer · answered May 19 '16 at 15:02

1

Why not

data <- 1:1000
n <- length(data)
quantile <- 0.01 # cumsum top 1%
cumsum(data[1:floor(n*quantile)])

answered May 19 '16 at 15:02

rbm

3,243
2
17
28

Incase if I have duplicates in my dataset will `cumsum` add the duplicate also or will it skip it – mockash May 19 '16 at 18:38

Eric Lecoutre · Answer 2 · 2016-05-19T15:11:24.543

0

I would begin to ensure dataframe is sorted..., I assume you only want the aggregated cumsum, not the detail

percentage=0.1
cumsum(df$Amount)[round(quantile(0:nrow(df),percentage))]

edited May 19 '16 at 15:11

answered May 19 '16 at 15:04

Eric Lecoutre

1,461
16
25

How to get cumulative sum based on a condition

2 Answers2

Linked