2

This is very similar to this question, but with an added layer. I am looking to see if a string in one column exists in another column. But since for some rows the column is empty, when I run the code below I get a lot of 'TRUE' because they just match spaces. How can I ignore spaces and just match on characters?

word <- c('Hello','','nyc', '')
keywords <- c('hello goodbye nyc','hello goodbye nyc', 'hello goodbye nyc', 'hello goodbye nyc')
df <- data.frame(word, keywords, stringsAsFactors=F)

What I want is to add a new column (word_exists) that tells me if strings in column 'word' exists among 'keywords'. I tried:

df$word_exists <- mapply(grepl, pattern=df$keywords, x=df$word)

But get all 'TRUE' and I think it is because it is recognizing empty spaces in 'keywords' and matching them to empty 'words'. Any suggestions? Thanks!

Community
  • 1
  • 1
Agustín Indaco
  • 550
  • 5
  • 17

2 Answers2

4

Just use nzchar to check that your pattern has characters:

transform(df, word_exists=mapply(grepl, pattern=word, x=keywords) & nzchar(word))
#    word          keywords word_exists
# 1 Hello hello goodbye nyc       FALSE
# 2       hello goodbye nyc       FALSE
# 3   nyc hello goodbye nyc        TRUE
# 4       hello goodbye nyc       FALSE
BrodieG
  • 51,669
  • 9
  • 93
  • 146
0

A quick fix would be to replace your blank strings with NAs. Some thing like this works:

df[df$word=="","word"]<-NA
df$word_exists <- as.logical(mapply(grepl, pattern=df$word, x=df$keywords))

        word          keywords word_exists
1 Hello hello goodbye nyc       FALSE
2  <NA> hello goodbye nyc          NA
3   nyc hello goodbye nyc        TRUE
4  <NA> hello goodbye nyc          NA
Mike H.
  • 13,960
  • 2
  • 29
  • 39