I am stuck trying to do something that should be simple: Use grep()
to test pattern matching on a string for multiple variables in a single dataframe. All searches for this lead me to instructions on how to grep()
on multiple patterns.
Create data:
df <- data.frame(a = c("apple", "plum", "pair", "apple"),
b = c(1, 2, 3, 4),
c = c("plum", "apple", "grape", "orange"))
df
a b c
1 apple 1 plum
2 plum 2 apple
3 pair 3 grape
4 apple 4 orange
Now i want to check df$a
and df$c
for the string "apple". I want to do this because i want the values from df$b
for all rows with "apple" in either df$a
or df$c
.
My hope was to create a function: f(x)::grep("apple", df$x)
, and use lapply to test it over the list of variable names that i want to check for the pattern:
check_apple <- function(x) {
grep("apple", df$x)
}
But this doesn't work:
check_apple(a)
integer(0)
However this does work:
grep("apple", df$a)
[1] 1 4
Why doesn't this function work? Can I not use a a variable name as an argument in my function?
My plan was to apply the function to all the variables and them collapse the resulting list to single vector before selecting unique()
values to get all the rows in the dataframe that have variables with a string match in them. It goes without saying that my dataset is much larger than this example.
Can i fix the function, or is there another way to run grep()
over multiple variables?