0

I have a dataset with millions of values with 2 columns(ID, Amount). Amount is sorted in descending order. I need to get cumulative sum of amount based on a condition.

ID       Amount
101      40000
102      20000
103      15000
104      10000
......

For Example if there are 1000 rows I need the cumulative sum of first 1% i.e first 10 rows after sorting, then 4% (40), 15% (150), 35%(350) and below 50% (500).

How do I get this in R

mockash
  • 1,204
  • 5
  • 14
  • 26

2 Answers2

1

Why not

data <- 1:1000
n <- length(data)
quantile <- 0.01 # cumsum top 1%
cumsum(data[1:floor(n*quantile)])
rbm
  • 3,243
  • 2
  • 17
  • 28
  • Incase if I have duplicates in my dataset will `cumsum` add the duplicate also or will it skip it – mockash May 19 '16 at 18:38
0

I would begin to ensure dataframe is sorted..., I assume you only want the aggregated cumsum, not the detail

percentage=0.1
cumsum(df$Amount)[round(quantile(0:nrow(df),percentage))]
Eric Lecoutre
  • 1,461
  • 16
  • 25