2

How can I find the index of an element, when the element is determined using quantile()? The match() and which() solutions from this similar question do not work (they return NA), and I think they don't work because of rounding issues.

In the case that the quantile result is averaged/interpolated across two indices, can I specify if it takes the lower/higher index? My data x will always be sorted.

Example Dataset (Obviously the 0 and 1 quantiles here are just the min and max, they are just shown for a sanity check)

x <- c(0.000000e+00,9.771228e-09,5.864592e-06,3.474925e-04,9.083242e-04,2.458036e-02)
quantile(x, probs = c(0, 0.5, 1))
          0%          50%         100% 
0.0000000000 0.0001766785 0.0245803600 

How do I find the indices for these quantiles? Here, the indices are 1,??,6. And I guess the median is the average of two indices, so can I specific that it returns the first or second index?

jay.sf
  • 60,139
  • 8
  • 53
  • 110
a11
  • 3,122
  • 4
  • 27
  • 66

2 Answers2

1

Use findInterval ?

x <- c(0.000000e+00, 9.771228e-09, 5.864592e-06, 3.474925e-04,
       9.083242e-04,2.458036e-02)
findInterval(quantile(x, probs = c(0, 0.5, 1)), x)
#[1] 1 3 6
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Do you know why it says the 50th %tile is in position 3, but the quantile function returns 0.0001766785, rather than 5.864592e-06, for the 50th %tile? – a11 Jan 26 '21 at 03:15
  • `findInterval` gives the closest index before the number exceeds. `x - quantile(x, probs = 0.5)` might help to understand. – Ronak Shah Jan 26 '21 at 03:26
  • I think maybe this is a different question, but I don't understand why `quantile(x, probs = 0.5)` returns 0.0001766785. From the `x` dataset, it looks like the median should be 5.864592e-06, right? – a11 Jan 26 '21 at 03:42
1

You probably want type=4 which uses linear interpolation of the empirical cdf (i.e. considers the actual median).

x <- c(0.000000e+00,9.771228e-09,5.864592e-06,3.474925e-04,9.083242e-04,2.458036e-02)
(q <- quantile(x, probs=c(0, 0.5, 1), type=4))
#           0%          50%         100% 
# 0.000000e+00 5.864592e-06 2.458036e-02 
match(q, x)
# [1] 1 3 6
x[match(q, x)]
# [1] 0.000000e+00 5.864592e-06 2.458036e-02

Other example:

set.seed(42)
x <- runif(1e3)
(q <- quantile(x, probs=c(0, 0.5, 1), type=4))
#           0%          50%         100% 
# 0.0002388966 0.4803101290 0.9984908344 
match(q, x)
# [1]  92 174 917
x[match(q, x)]
# [1] 0.0002388966 0.4803101290 0.9984908344
jay.sf
  • 60,139
  • 8
  • 53
  • 110