11
temp = structure(list(name1 = structure(c(2L, 1L, 2L, 1L, 2L), .Label = c("Joe", 
"Mike"), class = "factor"), name2 = c("Nick", "Matt", "Nick", 
"Matt", "Nick"), name3 = c("Matt", "Tom", "Tom", "Steve", "Tom"
)), .Names = c("name1", "name2", "name3"), row.names = c(NA, 
-5L), class = "data.frame")

Hi all,

I have what feels like a simple coding question for R. See the following dataframe below, the code for which is above:

  name1 name2 name3
1  Mike  Nick  Matt
2   Joe  Matt   Tom
3  Mike  Nick   Tom
4   Joe  Matt Steve
5  Mike  Nick   Tom

I would like a simple function that returns a boolean vector indicating if a particular name appears in a row (in any column) of this dataframe. For example:

myfunction(Matt) 

# should return
c(TRUE, TRUE, FALSE, TRUE, FALSE).

since Matt appears in the 1st, 2nd and 4th rows. Any simple help with this is appreciated, thanks!

Canovice
  • 9,012
  • 22
  • 93
  • 211

4 Answers4

8

Here is an option. Use apply and match (%in%).

apply(temp, 1, function(x) any(x %in% "Matt")) 
[1]  TRUE  TRUE FALSE  TRUE FALSE
www
  • 38,575
  • 12
  • 48
  • 84
8

I've come up with my own solution as well:

rowSums("Matt" == temp) > 0 

seems to do the trick

zx8754
  • 52,746
  • 12
  • 114
  • 209
Canovice
  • 9,012
  • 22
  • 93
  • 211
  • That's a usually neat way to do it. I'll also suggest `Reduce(\`|\`, lapply(temp, \`==\`, "Matt"))` – thelatemail Aug 22 '17 at 22:25
  • `rowSums` or `Reduce` are also about 20 times faster than looping over each row in `apply`, and about 60 times faster than using `by_row` in `dplyr/purrr` – thelatemail Aug 22 '17 at 22:35
5

This solution used dplyr and purrr.

myFunction <- function(df, name) {
  by_row(df, function(x) {name %in% x}, .collate = "cols") %>%
    pull(.out)
}
myFunction(temp, "Matt")

by_row adds the boolean as a column. pull returns the column as a vector.

Update: by_row function has been removed from purrr

  • Is `by_row` from purrrlyr, not dplyr or purrr? – camille Dec 19 '19 at 14:13
  • @camille, at the time of writing the answer it was part of the `purrr`, but has since been deprecated in that library. Looks like the function is now in `purrrlyr` as you pointed out. – David Richards Dec 20 '19 at 19:57
4

There are other very consistent and more general approaches with dplyr or purrr, so you can avoid the problems that come from class coercion associated with the conversion to matrixes in apply(), the inefficiencies and verbose codes of for loops or other limitations that come from the rowSums proposal.

with purrr’s map, reduce and stringr’s str_detect:

library(purrr)
library(stringr)

temp%>%map(~str_detect(.x,'Matt'))%>%reduce(`|`)

With dplyr, using map%>%reduce, pmap%>%any, rowwise%>%any or if_any:

library(purrr)
library(dplyr)
library(stringr)

temp%>%mutate(has_Matt=map(., ~str_detect(.x, 'Matt'))%>%pmap_lgl(any))

#OR

temp%>%rowwise()%>%
        mutate(has_Matt=any(str_detect(c_across(everything()), "Matt")))

The most concise, with dplyr::if_any:

temp%>%mutate(has_Matt=if_any(everything(), ~.x=="Matt"))

If you want to define a new function that simplifies this operation, you can create a function with base R:

my_function<-function(dataframe, pattern){
        Reduce(`|`, Map(function(x) grepl('Matt', x), dataframe))
}

my_function(temp, "Matt")

[1]  TRUE  TRUE FALSE  TRUE FALSE
GuedesBF
  • 8,409
  • 5
  • 19
  • 37