0

I can determine the percentiles of a numerical column from a data frame using the quantile() function. However, I do not know how to get the percentile of a specific value from that same column without iteratively plugging in different probs =. How can I determine the percentile for a specific value from a distribution of values?

Example Data

set.seed(1234)
df <- data.frame(matrix(ncol = 1, nrow = 100))
colnames(df)[1] <- "value"
df$value <- rnorm(100, mean = 50, sd = 20)

# calculate percentiles from data frame

quantile(df$value, probs = seq(.1,.9, by = .1))

How would I determine the percentile for a value = 33 based on the distribution of values in df$value?

tassones
  • 891
  • 5
  • 18
  • 1
    So you just want the proportion of sample less than 33? You can get that with `mean(df$value < 33)` – MrFlick Sep 09 '22 at 17:27
  • 1
    If you need to do this for a multiple numbers, you could also use `ecdf()`. For example `ecdf(df$value)(33)` or `ecdf(df$value)(c(30,33,35))` – MrFlick Sep 09 '22 at 17:34
  • @MrFlick the result I get for ```ecdf(df$value)(33)``` is 0.27. Would saying "the value 33 is the 27th percentile for the distribution of values" be a proper interpretation of that result? – tassones Sep 09 '22 at 18:10
  • Well, a percentile like that doesn't have to be unique. For example both 18 and 19 would be classified as the 4th percentile. It all depends on how you want to interpret that statement. It is accurate to say that 27% of observed values are less than 33. – MrFlick Sep 09 '22 at 18:21

0 Answers0