0

Consider the dataset

words <- c("un ou deux", "partout", "desktop","top cinema", "book", "best cover")
dataset <- data.frame(words)

That looks like this:

       words
1 un ou deux
2    partout
3    desktop
4 top cinema
5       book
6 best cover

My goal is to associate a value to each row to a category based on the string value:

value1 <- c("top", "ou")
value2 <- c("best")

dataset$category[grepl(paste(value1,collapse = "|"),dataset$words)]="value1"
dataset$category[grepl(paste(value2,collapse = "|"),dataset$words)]="value2"

Thanks to a grepl, I obtain this output:

       words category
1 un ou deux   value1
2    partout   value1
3    desktop   value1
4 top cinema   value1
5       book     <NA>
6 best cover   value2

My issue: I don't want to associate the category value to a row if it doesn't fit EXACTLY the category value. For example desktop should not be associate to value1 cause "top" is inside the "desktop" string. DO you have any idea on how to make this? Thanks for your help!

The final dataset should looks like this:

       words category
1 un ou deux   value1
2    partout     <NA>
3    desktop     <NA>
4 top cinema   value1
5       book     <NA>
6 best cover   value2
Remi
  • 961
  • 1
  • 13
  • 25
  • 1
    `dataset$category[grepl(paste0("\\b(?:",paste(value1,collapse = "|"),")\\b"),dataset$words)]="value1"` and `dataset$category[grepl(paste0("\\b(?:",paste(value2,collapse = "|"),")\\b"),dataset$words)]="value2"` – Wiktor Stribiżew Apr 17 '18 at 12:24
  • Thanks Wiktor, that works perfectly. I tried the fixed = TRUE parameter in grepl but your method works better ! – Remi Apr 17 '18 at 12:30

0 Answers0