6
   id first  middle  last       Age
    1 Carol  Jenny   Smith      15
    2 Sarah  Carol   Roberts    20
    3 Josh   David   Richardson 22

I am trying find a specific name in ANY of the name columns (first, middle, last). For example, if I found anyone with a name Carol (doesn't matter if it's the first/middle/last name), I want to mutate a column 'Carol' and give 1. So what I want is the following

   id first  middle  last       Age  Carol
    1 Carol   Jenny   Smith      15   1
    2 Sarah  Carol   Roberts    20    1
    3 Josh   David   Richardson 22    0

I have been trying ifelse(c(first, middle,last) == "Carol" , 1, 0 ) or "Carol" %in% first...etc but for some reason I can only work on one column instead of multiple columns.. Could anyone help me please? Thank you in advance!

camille
  • 16,432
  • 18
  • 38
  • 60
JNB
  • 161
  • 2
  • 8
  • Does this answer your question? [in R, check if string appears in row of dataframe (in any column)](https://stackoverflow.com/questions/45827337/in-r-check-if-string-appears-in-row-of-dataframe-in-any-column) – camille Dec 19 '19 at 14:20

5 Answers5

6

We can use rowSums

df$Carol <- as.integer(rowSums(df[2:4] == "Carol") > 0)

df
#  id first middle       last Age Carol
#1  1 Carol  Jenny      Smith  15     1
#2  2 Sarah  Carol    Roberts  20     1
#3  3  Josh  David Richardson  22     0

If we need it as a function

fun <- function(df, value) {
   as.integer(rowSums(df[2:4] == value) > 0)
}

fun(df, "Carol")
#[1] 1 1 0
fun(df, "Sarah")
#[1] 0 1 0

but this assumes the columns you want to search are at position 2:4.

To give more flexibility with column position

fun <- function(df, cols, value) {
   as.integer(rowSums(df[cols] == value) > 0)
 }
fun(df, c("first", "last","middle"), "Carol")
#[1] 1 1 0
fun(df, c("first", "last","middle"), "Sarah")
#[1] 0 1 0
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • i see..why can't we use ifelse here though? just wondering..is it possible to use mutate? – JNB Mar 28 '19 at 16:10
  • 2
    @Molly we can but then it would not scale up. Imagine you need to this for 10-20 columns. – Ronak Shah Mar 28 '19 at 16:18
  • My solution simply uses mutate and if_else and works, but as @RonakShah pointed out it is to much work if scaled up to multiple columns. – Mojoesque Mar 28 '19 at 16:22
3

Here's a tidyverse option. We first reshape the data to long format, group by id, and find levels of id that have the desired name in at least one row. Then we reshape back to wide format.

library(tidyverse)

df %>% 
  gather(key, value, first:last) %>% 
  group_by(id) %>% 
  mutate(Carol = as.numeric(any(value=="Carol"))) %>% 
  spread(key, value)
     id   Age Carol first last       middle
1     1    15     1 Carol Smith      Jenny 
2     2    20     1 Sarah Roberts    Carol 
3     3    22     0 Josh  Richardson David

Or, as a function:

find.target = function(data, target) {

  data %>% 
    gather(key, value, first:last) %>% 
    group_by(id) %>% 
    mutate(!!target := as.numeric(any(value==target))) %>% 
    spread(key, value) %>% 
    # Move new target column to end
    select(-target, target)

}

find.target(df, "Carol")
find.target(df, "Sarah")

You could also do several at once. For example:

map(c("Sarah", "Carol", "David"), ~ find.target(df, .x)) %>% 
  reduce(left_join)
     id   Age first last       middle Sarah Carol David
1     1    15 Carol Smith      Jenny      0     1     0
2     2    20 Sarah Roberts    Carol      1     1     0
3     3    22 Josh  Richardson David      0     0     1
eipi10
  • 91,525
  • 24
  • 209
  • 285
2

Using tidyverse

library(tidyverse)
f1 <- function(data, wordToCompare, colsToCompare) {
          wordToCompare <- enquo(wordToCompare)
          data %>%
              select(colsToCompare) %>%
              mutate(!! wordToCompare :=  map(.,  ~ 
       .x == as_label(wordToCompare)) %>% 
           reduce(`|`) %>%
           as.integer)
              }
          
f1(df1, Carol, c("first", 'middle', 'last'))
# first middle       last Carol
#1 Carol  Jenny      Smith     1
#2 Sarah  Carol    Roberts     1
#3  Josh  David Richardson     0

f1(df1, Sarah, c("first", 'middle', 'last'))
#   first middle       last Sarah
#1 Carol  Jenny      Smith     0
#2 Sarah  Carol    Roberts     1
#3  Josh  David Richardson     0

Or this can also be done with pmap

df1 %>%
  mutate(Carol = pmap_int(.[c('first', 'middle', 'last')],
          ~ +('Carol' %in% c(...))))
#   id first middle       last Age Carol
#1  1 Carol  Jenny      Smith  15     1
#2  2 Sarah  Carol    Roberts  20     1
#3  3  Josh  David Richardson  22     0

which can be wrapped into a function

f2 <- function(data, wordToCompare, colsToCompare) {
      wordToCompare <- enquo(wordToCompare)
      data %>%
           mutate(!! wordToCompare := pmap_int(.[colsToCompare],
          ~ +(as_label(wordToCompare) %in% c(...))))
  } 

f2(df1, Carol, c("first", 'middle', 'last'))
#  id first middle       last Age Carol
#1  1 Carol  Jenny      Smith  15     1
#2  2 Sarah  Carol    Roberts  20     1
#3  3  Josh  David Richardson  22     0

NOTE: Both the tidyverse methods doesn't require any reshaping


With base R, we can loop through the 'first', 'middle', 'last' column and use == for comparison to get a list of logical vectors, which we Reduce to a single logical vector with | and coerce it to binary with +

df1$Carol <- +(Reduce(`|`, lapply(df1[2:4], `==`, 'Carol')))
df1
#  id first middle       last Age Carol
#1  1 Carol  Jenny      Smith  15     1
#2  2 Sarah  Carol    Roberts  20     1 
#3  3  Josh  David Richardson  22     0

NOTE: There are dupes for this post. For e.g. here

data

df1 <- structure(list(id = 1:3, first = c("Carol", "Sarah", "Josh"), 
middle = c("Jenny", "Carol", "David"), last = c("Smith", 
"Roberts", "Richardson"), Age = c(15L, 20L, 22L)),
  class = "data.frame", row.names = c(NA, 
 -3L))

          
   
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662
1

A solution using apply family

df$Carol = lapply(1:nrow(df), function(x) any(df[x,]=="Carol))
LocoGris
  • 4,432
  • 3
  • 15
  • 30
1

Another option using mutate and if_else() as you suggested:

library(tidyverse)

data = read_table("   id first  middle  last       Age
    1 Carol  Jenny   Smith      15
    2 Sarah  Carol   Roberts    20
    3 Josh   David   Richardson 22")
data %>%
  mutate(carol = if_else(first == "Carol" | middle == "Carol" | last == "Carol",
                         "yes",
                         "no"))

Result:

# A tibble: 3 x 6
     id first middle last         Age carol
  <dbl> <chr> <chr>  <chr>      <dbl> <chr>
1     1 Carol Jenny  Smith         15 yes  
2     2 Sarah Carol  Roberts       20 yes  
3     3 Josh  David  Richardson    22 no 
Mojoesque
  • 1,166
  • 8
  • 15