0

this is my code:

searchvector <- c("good", "wonderful", "bad", "great", "wonder")


> grepl("wonder", searchvector)
[1] FALSE  TRUE FALSE FALSE  TRUE
> grepl(paste0("\\b", "wonder", "\\b"), searchvector)
[1] FALSE FALSE FALSE FALSE  TRUE
> grepl(paste0("\\baudible\\b|\\b|\\bthalia\\b"), searchvector)
[1] TRUE TRUE TRUE TRUE TRUE

I have a large vector with text, where i want to seperate each word to calculate sentiment scores. I only want to match only exact strings, which i managed to do with \\b.

However, some texts matches the whole searchvector as you can see. I was not able to figure out why that is the case. Can anyone explain me what goes wrong here?

1 Answers1

1

You have a "standalone" \\b alternative that will match if there is a word char in the input.

You need to remove it, and wrap the words within a non-capturing group to only repeat \b once:

grepl(paste0("\\b(?:audible|thalia)\\b"), searchvector) 

R demo:

> searchvector <- c("good", "wonderful", "bad", "great", "wonder")
> grepl(paste0("\\b(?:audible|thalia)\\b"), searchvector)
[1] FALSE FALSE FALSE FALSE FALSE
> searchvector <- c("good", "wonderful", "bad", "great", "wonder", "thalia item")
> grepl(paste0("\\b(?:audible|thalia)\\b"), searchvector)
[1] FALSE FALSE FALSE FALSE FALSE  TRUE
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563