I'm trying to create a binary set of variables that uses data across multiple columns. I have a dataset where I'm trying to create a binary variable where any column with a specific name will be indexed for a certain value. I'll use iris as an example dataset.
Let's say I want to create a variable where any column with the string "Sepal" and any row in those columns with the values of 5.1, 3.0, and 4.7 will become "Class A" while values with 3.1, 5.0, and 5.4 will be "Class B". So let's look at the first few entries of iris
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
The first 3 rows should then be under "Class A" While rows 4-6 will be under "Class B". I tried writing this code to do that
mutate(iris, Class = if_else(
vars(contains("Sepal")), any_vars(. %in% c(5.1,3.0, 4.7))), "Class A",
ifelse(vars(contains("Sepal")), any_vars(. %in% c(3.1, 5.0, 5.4))), "Class B",NA)
and received the error
Error: `condition` must be a logical vector, not a `quosures/list` object
So I've realized I need lapply
here, but I'm not even sure where to begin to write this because I'm not sure how to represent the entire part of selecting columns with "Sepal" in the name and also include the specific values in those rows as one list object to provide to lapply
This is clearly the wrong syntax
lapply(vars(contains("Sepal")), any_vars(. %in% c(5.1,3.0, 4.7)))
Examples using case_when
will also be accepted as answers.