I have a data set in R that contains majors and I want to make a new variable for each observation that shows what school they belong to based on their major.
## function to find school by major
findschool <- function(major) {
art_majors <- c("Art", "Design", "Classics", "French", "German", "Russian", "Spanish", "Romance Lanugage", "English", "Linguistics", "Music", "Theatre", "Creative Writing")
business_majors <- c("Accounting", "Business Administration")
health_majors <- c("Communication Disorders", "Health Science", "Exercise Science", "Athletic Training", "Nursing")
science_majors <- c("Agricultural Science", "Biology", "Chemistry", "Computer Science", "Mathematics", "Physics", "Statistics")
social_majors <- c("Communication", "Economics", "History", "Justice Systems", "Philosophy & Religion", "Political Science", "Psychology", "Sociology/Anthropology")
##major = format(as.character(major))
print(major)
if (any(str_detect(art_majors, major))) {
"Arts & Letters"
} else if (any(str_detect(business_majors, major))) {
"Business"
} else if (any(major %in% health_majors)) {
"Health Sciences & Education"
} else if (any(major %in% science_majors)) {
"Science & Mathematics"
} else if (any(major %in% social_majors)) {
"Social & Cultural Studies"
} else if (any(major %in% c("Undeclared", NA))) {
NA
} else {
"Interdisciplinary Studies"
}
}
## for full data
checkins.clean <- checkins.clean %>%
mutate(School = findschool(major = trimws(Primary.Major)))
This is not working for me. I have tried str_detect as well as straightforward relational operator '='. But I still always get either the very first result for all observations "Arts & Letters" or the very last in the else block "Interdisciplinary Majors." Occasionally I get the warning messages "the condition has length > 1 and only the first element will be used" and another one based on object length. I don't know how to return exactly what I want.
Here is a sample input:
checkins.clean <- data.frame("SN" = 1:2, "Name" = c("John", "Dora"), "Major" = c("Computer Science", "English"))
I would need an output that looks like this:
output <- data.frame("SN" = 1:2, "Name" = c("John", "Dora"), "Major" = c("Computer Science", "English"), "School" = c("Science & Mathematics", "Arts & Letters"))
Edit: Solved using Vectorize
I added another function named
findschool_v <- Vectorize(findschool)
and called findschool_v
inside the mutate
block.