1

I'm new to R and trying to figure out how to create a new variable based on the frequency of another variable in a data frame. I have many observations and would like to group them by small (less than 15 observations), medium (15-20 obs) and large (more than 20 obs), that is, I am trying to recode class_size to an ordinal variable. For example, if I have the following data:

df <- data.frame(student_id = c(A,B,C,D,E,F,G,H,I,J),
       class_size = c(10,15,20,15,35,25,11,40,40,10))

I'd like to get the following results:

student_id  class_size  new_class_size 
   A              10        small
   B              15        medium  
   C              20        medium 
   D              15        small 
   E              35        large   etc...
   F
   G
   H
   I
   J

I looked at the function case_when but it didn't give me what I was looking for. How do I recode the class_size variable in R?

cajt
  • 49
  • 3

1 Answers1

-1

We could use cut with breaks specified as the break points and the labels

library(dplyr)
df <- df %>%
   mutate(new_class_size = cut(class_size, 
      breaks = c(-Inf, 15, 20, Inf), labels = c("small", "medium", "large")))

-output

df <- structure(list(student_id = c("A", "B", "C", "D", "E", "F", "G", 
"H", "I", "J"), class_size = c(10, 15, 20, 15, 35, 25, 11, 40, 
40, 10)), class = "data.frame", row.names = c(NA, -10L))
akrun
  • 874,273
  • 37
  • 540
  • 662