2

This seems like a simple problem, but for some reason I haven't been able to find a solution.

I have a matrix of probabilities that sum to 1, and I want to know at which value I have a cumulative sum of, for example, 0.5. In other words, if I turned this matrix into a sorted vector, how far do I have to go from the highest value to get a cumulative sum of 0.5.

I transformed my matrix into a vector of values and used plot(cumsum(x)) to produce the following graph:

Cumulative Sum of Vector Values

I can do something like

P<-ecdf(x)
P(0.00001)

to get the cumulative sum at an x value of 0.00001, but I want to go in the other direction, i.e. what is the x value at a cumulative sum of 0.5?

quantile() gives me the value at 50% of the ordered values (e.g. it would give me the value of sort(x)[4e+05] in the graph above), which is not what I'm after.

Thanks for your help with this seemingly simple question!

Cheers, Josh

Solution:

x[max(which(cumsum(x)<=0.5))]

gives the value at the cumulative sum of 0.5 (thanks @plafort), although it seems as though there should be an easier way!

stewart6
  • 259
  • 2
  • 4
  • 15
  • 2
    It would be helpful if you include a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data and desired out for that sample. `cumsum` returns a vector so I don't understand how you can do `P(0.00001)` because `P` should be a vector, not a function. – MrFlick Jun 10 '15 at 19:37
  • As @MrFlick correctly pointed out, `P` must be a function, in your case with a plot - it must be a fit of an analytical (and typically smooth) function to your data points – Alexey Ferapontov Jun 10 '15 at 19:41
  • ditto what was said before. Here's a simple ex. `vec <- seq(.01, 1, length.out=30)`. Solution: `max(which(cumsum(vec) <= 0.5))` will give a position number and `cumsum(vec)[max(which(cumsum(vec) <= 0.5))]` will give the value that approaches 0.5 but doesn't go over. – Pierre L Jun 10 '15 at 19:44
  • Hi guys, apologies on two fronts: 1) I figured I was missing something so simple that I didn't include reproducible data, and 2) I was using ecdf() to get the corresponding cumulative sum, and not cumsum() (fixed in the text above now). @plafort, your code does the trick, but it seems like there ought to just be a function that does the opposite of ecdf()! – stewart6 Jun 10 '15 at 19:59

1 Answers1

0

I think I get what you want; Here is my solution: where my goal is to find out the element of the matrix where the cumsum is >= 20 for example. Even though I think that there must be a super easier way to achieve that.

set.seed(1)
data <- matrix(rnorm(9, 10), 3, 3)
data
          [,1]      [,2]     [,3]
[1,]  9.373546 11.595281 10.48743
[2,] 10.183643 10.329508 10.73832
[3,]  9.164371  9.179532 10.57578
which(cumsum(data) >= 500)[1]
[1] NA
which(cumsum(data) >= 20)[1]
[1] 3
SabDeM
  • 7,050
  • 2
  • 25
  • 38