1

I have a data frame df that has a column indicating the grade level of each participant in that data frame. Grade levels range from "K" to "9". I have another table called "quantile" that contains the cut-off values of each quantile of the test score for each grade level. How can I create a new column in the df data which indicates the quantile of each students' test score based on the reference table "quantile"? The rule is that if a test score's value is less than the value of the cut-off, that test score belongs to the quantile of the cut-off value. And, if the test score is between 2 cut-off values, that test score is classified as the greater quantile. Thank you much!

Below is the dummy data and the result table I was looking for:

df <- data.frame("Name" = c("John", "Mary", "Emily"), "Grade" = c("8","9","10"), "Test Score" = c(5,61,60))
df

quantile <- data.frame("Grade" = c("8","9","10"), "Quantile 1" = c(10,15,20), "Quantile 2" = c(50,60,70),
                       "Quantile 3" = c(60,80,100))
quantile

result <- data.frame("Name" = c("John", "Mary", "Emily"), "Grade" = c("8","9","10"),"Test Score" = c(5,61,60), 
                     "Quantile" = c("Quantile 1","Quantile 3","Quantile 2") )

result

Ann Ng
  • 45
  • 5
  • I voted for reopen since I'm no longer convinced this is "just a 'How to join'"-question. – Martin Gal Aug 10 '21 at 22:59
  • 1
    @MartinGal: I think it was automatically closed yesterday! I did not close it, but I also got the workable solution! – Ann Ng Aug 11 '21 at 17:49

1 Answers1

3

We reshape the 'quantile' data from wide to 'long', then join with the 'df' dataset, and do a group by summarise to return the first 'Quantile' value where 'Test.Score' is less than 'value'

library(dplyr)
library(tidyr)
quantile %>%
    pivot_longer(cols = -Grade, names_to = 'Quantile') %>% 
    left_join(df) %>% 
    group_by(Name, Grade) %>%
    summarise(Test.score = first(Test.Score), 
     Quantile = Quantile[which(Test.Score < value)[1]],
        .groups = 'drop') %>% 
    arrange(match(Name, df$Name))

-output

# A tibble: 3 x 4
  Name  Grade Test.score Quantile  
  <chr> <chr>      <dbl> <chr>     
1 John  8              5 Quantile.1
2 Mary  9             61 Quantile.3
3 Emily 10            60 Quantile.2
akrun
  • 874,273
  • 37
  • 540
  • 662
  • sorry, I failed to mention this in the initial question. Supposing that I don't want to merge the 2 tables, but just want to create another column "quantile" in the df data set. This new column notes whether a student's test score belongs to which quantile based on their test score and the reference table. Can you advise in this case? I added another column in the df data set accordingly. – Ann Ng Aug 10 '21 at 20:26
  • @AnnNg can you show the expeceted ouptut in the post. Do you want both the quantile columns or not – akrun Aug 10 '21 at 20:29
  • @AnnNg i read your dscription as `How can I create a new column in the df data which indicates the value of quantile 1 for each student?` that implies only quantile 1 – akrun Aug 10 '21 at 20:30
  • I adjusted my question as above. sorry for the confusion. hope it's clearer now! @akrun – Ann Ng Aug 10 '21 at 20:40
  • can I ask another question as I develop my analysis? how would you adjust the above codes to group all test scores which are more than the cut-off values of Quantile 3 into another group, say "Quantile 4"? – Ann Ng Aug 15 '21 at 21:23
  • @AnnNg Did you meant `summarise(Test.score = first(Test.Score), Quantile = if(value[Quantile == 'Quantile.3'] > first(Test.score)) 'Quantile.4' else Quantile[which(Test.Score < value)[1]])` – akrun Aug 15 '21 at 21:30
  • 1
    oh never mind, I played around with the codes and found a way to assign another category for values larger than Quantile 3's cut-off value. – Ann Ng Aug 15 '21 at 21:30