I have strings with multiple potential duplicated words:
df <- data.frame(
words = c("if,go,if,to,go,and,if,go,don't,is,give,to,my,go",
NA,
"like,like,so,many,times,like,so,one,no,no,no,bathroom"))
I would like to reduce the words
strings such that only the unique words
values remain. I've tried this regex but the result it produces is far from perfect:
library(stringr)
str_extract_all(df$words, "(?<=\\s|^)(\\w+)(?=,|$)(?!\\1+)")
[[1]]
[1] "if"
[[2]]
[1] NA
[[3]]
[1] "like"
The result I need to get (preferably with a regex answer) is this:
[[1]]
[1] "if,go,to,and,don't,is,give,my"
[[2]]
[1] NA
[[3]]
[1] "like,so,many,times,one,no,bathroom"