1

Our professor wants us to use the cumulative frequency distribution created for a dataset to find percentiles and percentile ranks. It is easy to calculate them from the original data set values, but confusing when using cumulative frequency distribution. How do you do that ?

Thank you

**For percentiles, I tried: **

library(dplyr)
quantile(ftable$cum.freq, c(0.5, 0.25), type = 5) 

**For percentile ranks, I tried: **

idx_score_to_rank_a <- original_data == 41
idx_score_to_rank_b <- original_data == 28

unique(percent_rank(ftable$cum.freq)[idx_score_to_rank_a])
unique(percent_rank(ftable$cum.freq)[idx_score_to_rank_b])

This second one only gives me a value for rank a, not rank b.

Edit:

This is my ftable:

class.int freq rel.freq cum.freq cum.percent.dist
1     (5,10]    5   0.0641        5             6.41
2    (10,15]    9   0.1154       14            17.95
3    (15,20]   17   0.2179       31            39.74
4    (20,25]   15   0.1923       46            58.97
5    (25,30]   11   0.1410       57            73.07
6    (30,35]    8   0.1026       65            83.33
7    (35,40]    3   0.0385       68            87.18
8    (40,45]    4   0.0513       72            92.31
9    (45,50]    2   0.0256       74            94.87
10   (50,55]    2   0.0256       76            97.43
11   (55,60]    2   0.0256       78            99.99
12   (60,65]    0   0.0000       78            99.99

This is the question I want to answer:

Using the cumulative frequency distribution, determine the following percentiles: a. P50 b. P25

And, using the cumulative frequency distribution, determine the following percentile ranks : a. percentile rank of a score of 41 b. percentile rank of a score of 28

dput of table:

    > dput(ftable)
structure(list(class.int = structure(1:12, levels = c("(5,10]", 
"(10,15]", "(15,20]", "(20,25]", "(25,30]", "(30,35]", "(35,40]", 
"(40,45]", "(45,50]", "(50,55]", "(55,60]", "(60,65]"), class = "factor"), 
    freq = c(5L, 9L, 17L, 15L, 11L, 8L, 3L, 4L, 2L, 2L, 2L, 0L
    ), rel.freq = c(0.0641, 0.1154, 0.2179, 0.1923, 0.141, 0.1026, 
    0.0385, 0.0513, 0.0256, 0.0256, 0.0256, 0), cum.freq = c(5L, 
    14L, 31L, 46L, 57L, 65L, 68L, 72L, 74L, 76L, 78L, 78L), cum.percent.dist = c(6.41, 
    17.95, 39.74, 58.97, 73.07, 83.33, 87.18, 92.31, 94.87, 97.43, 
    99.99, 99.99)), row.names = c(NA, -12L), class = "data.frame")

This ftable was created from these scores:

> dput(scores)
c(10, 13, 22, 26, 16, 23, 35, 53, 17, 32, 41, 35, 24, 23, 27, 
16, 20, 60, 48, 43, 52, 31, 17, 20, 33, 18, 23, 8, 24, 15, 26, 
46, 30, 19, 22, 13, 22, 14, 21, 39, 28, 43, 37, 15, 20, 11, 25, 
9, 15, 21, 21, 25, 34, 10, 23, 29, 28, 18, 17, 24, 16, 26, 7, 
12, 28, 20, 36, 16, 14, 18, 16, 57, 31, 34, 28, 42, 19, 26)
  • You already have an ECDF function. You cannot use `quantile` unless you have the raw data. Show us some of the `ftable` object. It probably has two columns: X and cum.freq. Then tell us exactly what you want to get. – IRTFM Jan 23 '23 at 05:09
  • @IRTFM thank you for responding! I will add the requested information under my post as this comment box says it is too long. – Pawan Singh Jan 23 '23 at 05:21
  • Greetings! Usually it is helpful to provide a minimally reproducible dataset for questions here so people can troubleshoot your problems (rather than a table like yours or a screenshot for example). One way of doing is by using the `dput` function on the data or a subset of the data you are using, then pasting the output into your question. You can find out how to use it here: https://youtu.be/3EID3P1oisg – Shawn Hemelstrand Jan 23 '23 at 05:42
  • You supposed to [edit] your question itself. Do not use comments to add either data or code. – IRTFM Jan 23 '23 at 05:46
  • As should be obvious, that table is very coarse, and there is no interval that has a cum.pct near 25 or 50. So you will need to estimate based on some sort of interpolation. The 25th percentile is clearly between 15 ( the 18th percentile) and 20 (the 31st percentile) 17.5 would be a reasonable estimate. Other forms of estimation would be possible if you fit distribution to that data. – IRTFM Jan 23 '23 at 05:54
  • You can see an example of how this is typically done in the question here. It makes the job of the answerer easier: https://stackoverflow.com/questions/73725328/how-to-fill-blank-section-of-faceted-ggplot-in-r – Shawn Hemelstrand Jan 23 '23 at 06:43
  • @IRTFM, thank you. For the interpolation, is there a function on r for that or should I do it manually using basic functions? – Pawan Singh Jan 23 '23 at 06:48
  • @ShawnHemelstrand added it just now! Thank you! – Pawan Singh Jan 23 '23 at 06:54
  • @ShawnHemelstrand, Thank you for informing me about this! – Pawan Singh Jan 23 '23 at 08:05
  • @PawanSingh There are both the `approx` and `approxfun` which will do linear approximations. I expect that there are many SO examples using these two functions. I got 21 hits in a search for “approxfun ecdf”. The cum.pct is quite similar to an ECDF. – IRTFM Jan 23 '23 at 11:00

0 Answers0