-1

I know how to get the 25%, 50% and 75% percentile.

set.seed(123)
a <- rnorm(100)
quantile(a)

What if I want to know the a[13] rank percentile.

I have seen this similar question[https://stackoverflow.com/questions/21219447/calculating-percentile-of-dataset-column] but is not what I wanted.

For example:

If I want to get the 13% percentile I can use this:

quantile(a, prob = 0.13)

Then I will get

>      13% 
>-1.019541 

But this is not what I wanted. I want to get the percentile value of which my vector a.

For example, I want to get a[13] percentile, maybe the function should like this:

get_percentile_value(a[13])

> 16.26%

Then I can know a[13](0.4007715) rank 16.26% in total a.

Is there any way to do that in r?

Any help will be highly appreciated!

zhiwei li
  • 1,635
  • 8
  • 26
  • 1
    Can you explain where the 16.26% comes from? Do any of the options from `rank()` or the dplyr wrappers such as `dplyr::percent_rank()` or `dplyr::cume_dist()` do what you want? – Calum You Sep 18 '20 at 01:27
  • 1
    Please don't post multiple times the same question, moreover when it already has answers on SO – Cath Sep 18 '20 at 07:13

2 Answers2

2

What you are looking for is vaguely related to the empirical distribution function, although it doesnt strictly fit the bill due to the fact that you're not necessarily looking at a probability distribution, per se (although you are in your example).

In any case, here is a simple approach:



pctl = function(vector, value){
  
  out = sum(value >= vector)/ length(vector)
  
  return(out)
  
}

set.seed = 666

a = rnorm(100)
pctl(a, a[13])

>.85

What this does is sum up the number of values for which your test value is larger by coercing the logical vector to a numeric one, and then divides by the total number of observations in order to get a percentage.

John
  • 312
  • 1
  • 8
0

Messy and inefficient Base R solution (should do exactly what you are after):

get_percentile_value <- function(vec_w_idx){
  # Store argument as string: val => string scalar
  val <- deparse(substitute(vec_w_idx))
  # Extract the index from the argument: idx => integer scalar
  idx <- as.integer(gsub("(.*[[])(\\d+)[]]", "\\2", val))
  # Pull vector referenced in argument from Global Environment: 
  # vec => numeric vector
  vec <- eval(parse(text = gsub("(^\\w+)\\[.*", "\\1", val)))
  # Calculate the percentile rank of each value in the vector: 
  # pc_rnk => data.frame
  pc_rnk <- data.frame(srt_vec = sort(vec), pc_rnk = seq_along(vec)/length(vec))
  # Lookup the percentile rank and store it as a vector: res => data.frame
  res <- data.frame(vec = vec, pc_rnk = pc_rnk$pc_rnk[match(vec, pc_rnk$srt_vec)])
  # Return the percentile rank of the value at given index: 
  # double scalar => .GlobalEnv()
  return(res$pc_rnk[idx])
}

# Apply function: double scalar => stdout (console)
get_percentile_value(a[14])

Or if the output must match exactly what you requested:

# Function to take a vector (with index provided), and return 
# a percentile rank: get_percentile_value => function() 
get_percentile_value <- function(vec_w_idx){
  # Store argument as string: val => string scalar
  val <- deparse(substitute(vec_w_idx))
  # Extract the index from the argument: idx => integer scalar
  idx <- as.integer(gsub("(.*[[])(\\d+)[]]", "\\2", val))
  # Pull vector referenced in argument from Global Environment: 
  # vec => numeric vector
  vec <- eval(parse(text = gsub("(^\\w+)\\[.*", "\\1", val)))
  # Calculate the percentile rank of each value in the vector: 
  # pc_rnk => data.frame
  pc_rnk <- data.frame(srt_vec = sort(vec), pc_rnk = seq_along(vec)/length(vec))
  # Lookup the percentile rank and store it as a vector: res => data.frame
  res <- data.frame(vec = vec, pc_rnk = pc_rnk$pc_rnk[match(vec, pc_rnk$srt_vec)])
  # Return the percentile rank of the value at given index: 
  # double scalar => .GlobalEnv()
  return(paste0(round(res$pc_rnk[idx] * 100, 4), "%"))
}
hello_friend
  • 5,682
  • 1
  • 11
  • 15