1

I don't understand the following behavior with quantile. With type=2 it should average at discontinuities, but this doesn't seem to happen always. If I create a list of 100 numbers and look at the percentiles, then shouldn't I take the average at every percentile? This behavior happens for some, but not for all (i.e. 7th percentile).

quantile(seq(1, 100, 1), 0.05, type=2)
# 5%
# 5.5 

quantile(seq(1, 100, 1), 0.06, type=2)
# 6%
# 6.5 

quantile(seq(1, 100, 1), 0.07, type=2)
# 7%
# 8 

quantile(seq(1, 100, 1), 0.08, type=2)
# 8%
# 8.5 

Is this related to floating point issues?

100*0.06 == 6
#TRUE

100*0.07 == 7 
#FALSE

sprintf("%.20f", 100*0.07)
#"7.00000000000000088818"
ALollz
  • 57,915
  • 7
  • 66
  • 89
  • 1
    FYI, your second code-block is related to [R FAQ 7.31](https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f) and https://stackoverflow.com/q/9508518/3358272. – r2evans Apr 23 '20 at 18:21

1 Answers1

2

As far as I can tell, it is related to floating points as 0.07 is not exactly representable with floating points.

p <- seq(0, 0.1, by = 0.001)
q <- quantile(seq(1, 100, 1), p, type=2)
plot(p, q, type = "b")
abline(v = 0.07, col = "grey")

enter image description here

If you think of the quantile (type 2) as a function of p, you will never evaluate the function at exactly 0.07, hence your results.Try e.g. decreasing by in the above. In that sense, the function returns exactly as expected. In practice with continuous data, I cannot imagine it would be of any consequence (but that is a poor argument I know).

Anders Ellern Bilgrau
  • 9,928
  • 1
  • 30
  • 37