0

I have a data set in R that contains majors and I want to make a new variable for each observation that shows what school they belong to based on their major.

## function to find school by major
findschool <- function(major) {
  art_majors <- c("Art", "Design", "Classics", "French", "German", "Russian", "Spanish", "Romance Lanugage", "English", "Linguistics", "Music", "Theatre", "Creative Writing")
  business_majors <- c("Accounting", "Business Administration")
  health_majors <- c("Communication Disorders", "Health Science", "Exercise Science", "Athletic Training", "Nursing")
  science_majors <- c("Agricultural Science", "Biology", "Chemistry", "Computer Science", "Mathematics", "Physics", "Statistics")
  social_majors <- c("Communication", "Economics", "History", "Justice Systems", "Philosophy & Religion", "Political Science", "Psychology", "Sociology/Anthropology")
  ##major = format(as.character(major))
  print(major)
  if (any(str_detect(art_majors, major))) {
    "Arts & Letters"
  } else if (any(str_detect(business_majors, major))) {
    "Business"
  } else if (any(major %in% health_majors)) {
    "Health Sciences & Education"
  } else if (any(major %in% science_majors)) {
    "Science & Mathematics"
  } else if (any(major %in% social_majors)) {
    "Social & Cultural Studies"
  } else if (any(major %in% c("Undeclared", NA))) {
    NA
  } else {
    "Interdisciplinary Studies"
  }
}

## for full data
checkins.clean <- checkins.clean %>%
  mutate(School = findschool(major = trimws(Primary.Major)))

This is not working for me. I have tried str_detect as well as straightforward relational operator '='. But I still always get either the very first result for all observations "Arts & Letters" or the very last in the else block "Interdisciplinary Majors." Occasionally I get the warning messages "the condition has length > 1 and only the first element will be used" and another one based on object length. I don't know how to return exactly what I want.

Here is a sample input:

checkins.clean <- data.frame("SN" = 1:2, "Name" = c("John", "Dora"), "Major" = c("Computer Science", "English"))

I would need an output that looks like this:

output <- data.frame("SN" = 1:2, "Name" = c("John", "Dora"), "Major" = c("Computer Science", "English"), "School" = c("Science & Mathematics", "Arts & Letters"))

Edit: Solved using Vectorize I added another function named findschool_v <- Vectorize(findschool) and called findschool_v inside the mutate block.

Khatevate
  • 13
  • 4
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Jul 23 '20 at 20:51
  • 1
    You will need to make sure your function is "vectorized" so that it will work when you pass in a vector of values at a time which is how most functions in R work. When you use `if` like this to test one value, your function will not be vectorized. Or use `rowwise()` to make sure you only operator on one value at a time (though that can be less efficient in some cases). – MrFlick Jul 23 '20 at 20:53
  • 1
    Maybe check out https://deanattali.com/blog/mutate-non-vectorized/ or https://stackoverflow.com/questions/28579361/mutate-transform-in-r-dplyr-pass-custom-function – MrFlick Jul 23 '20 at 20:54
  • @MrFlick Thanks for that! I am confused by that article though. Do I just need to make another function that basically contains the `Vectorize()` keyword to solve my problem? – Khatevate Jul 23 '20 at 21:03
  • Or you need to change your function so it only uses vectorized functions internally. For example `case_when` is a vectorized function for re-coding while `if/else` blocks are not. – MrFlick Jul 23 '20 at 21:07
  • @MrFlick Thanks! Using `Vectorize()` solved my problem. – Khatevate Jul 23 '20 at 21:10

0 Answers0