3

Is there a way in R to check whether a value in one column contains a value within another column?

In the below example, I am trying to see whether values in col2 are contained within the values in col1 (independently within each row) but getting a warning message: "argument 'pattern' has length > 1 and only the first element will be used".

Flag column should show "Yes" for the first/last row and "No" for the 2nd and 3rd rows. Any thoughts on how to resolve would be greatly appreciate.

col1 <- c("R.S.U.L.C","S.I.W","P.U.E","A.E.N")
col2 <- c("R","U","I","N")

df2 <- data.frame(col1,col2)

df2$Flag <- ifelse(grepl(df2$col2,df2$col1),"Yes","No")
zx8754
  • 52,746
  • 12
  • 114
  • 209
Matt Gossett
  • 184
  • 3
  • 13

4 Answers4

4

df2$flag <- mapply(grepl, df2$col2, df2$col1)

grepl()'s pattern argument only uses the first element:

See ?grepl:

If a character vector of length 2 or more is supplied, the first element is used with a warning.

luoar
  • 130
  • 8
4

We can use str_detect which is vectorized for both pattern and string

library(dplyr)
library(stringr)
df2 <- df2 %>% 
     mutate(Flag = c('No', 'Yes')[1+str_detect(col1, as.character(col2))])
df2
#       col1 col2 Flag
#1 R.S.U.L.C    R  Yes
#2     S.I.W    U   No
#3     P.U.E    I   No
#4     A.E.N    N  Yes
akrun
  • 874,273
  • 37
  • 540
  • 662
2

This can be done with a combination of sapply/grepl. Loop along df2$col and grepl it in string df$col1.
The one-liner is obvious.

i <- sapply(seq_along(df2$col2), function(i) grepl(df2$col2[i], df2$col1[i]))
df2$Flag <- c("No", "Yes")[i + 1L]
df2
#       col1 col2 Flag
#1 R.S.U.L.C    R  Yes
#2     S.I.W    U   No
#3     P.U.E    I   No
#4     A.E.N    N  Yes
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
0

A tidy implementation of str_detect, using ifelse. Note that the use of fixed() ensures literal content matching. Otherwise, str_detect defaults to regex which can cause unexpected behaviour if the pattern column contains characters that are interpretable as regular expressions.

library(tidyverse)

df2 <- df2 %>% 
 mutate(Flag = ifelse(str_detect(col1, fixed(as.character(col2))), "Yes", "No"))

 df2
 #       col1 col2 Flag
 #1 R.S.U.L.C    R  Yes
 #2     S.I.W    U   No
 #3     P.U.E    I   No
 #4     A.E.N    N  Yes
GGAnderson
  • 1,993
  • 1
  • 14
  • 25