Lets say my data looks like this:
vector = c("Happiness with KK Happiness without KK", "I love some coding I love major coding", "fun 2 fun 3")
I want to remove ALL duplicate words, including the first instance of each duplicate word. So, my output would look like this:
[1] "with without"
[2] "some major"
[3] "2 3"
Basically, it's similar to this problem: How do keep only unique words within each string in a vector. However I don't want to keep even the first instance of a duplicated word.
I tried to use strsplit()
along " "
and duplicated()
to split each string into its various words and then detect duplicates.
The issue with using duplicated()
is that it only returns a logical vector of the second instance of the duplicate word. Furthermore, using strsplit()
gives me the output in the form of a list, which really complicates things, for example, when I want to obtain a subset of the duplicate words (usually something like df[duplicated(df)]
which doesn't work on lists).