0

I have a dataset containing several variables, two of which are age and wage. They are both numeric variables. I want to create a function in R that will group the variable age as follows:

  • less or equal to 35 :“young”
  • over 35 and less or equal to 55 : “adult”
  • over 55 : “old”

I want to then use the tapply function to compute the average wage for these 3 groups

This is the code I have:

age<-our_data$age
wage<-our_data$wage

my_function<-function(age){
  for(i in 1:length(age)){
    if (i <= 35){
      i="young"
      }
    else if(i>35 & i<=55){
      i="adult"
      }else if(i>55){
        i="old"
  }
  }
}

tapply(wage, my_function(age), mean)

However it is not running. It says arguments must have same length even tho both wage and age have length 534.

Phil
  • 7,287
  • 3
  • 36
  • 66
  • You can just do `our_data$age <- dplyr::case_when(our_data$age <= 35 ~ "young", our_data$age <= 55 ~ "adult", our_data$age > 55 ~ "old")` – Phil Oct 11 '22 at 00:24
  • Generally, R vectorizes by default, which means that you never have to build a loop in the manner that you are attempting here. – Phil Oct 11 '22 at 00:25
  • `if() {} else{}` only work on single values. For vectors/columns, use `ifelse` (or `case_when` from `dplyr`). For binning in particular, you could also use the `cut` function, `cut(age, breaks = c(-Inf, 35, 55, Inf))` – Gregor Thomas Oct 11 '22 at 03:31

0 Answers0