I was wondering whether or not it was possible to remove duplicate sentences or even duplicated blocks of texts, meaning a duplicate set of sentences from a dataframe in R. In my specific case, you could imagine I have saved the posts of a forum but have not highlighted when a person quoted a post that has been made before, and now want to remove all quotes from the different cells containing the different posts. Thanks for any tips or hints.
An example could look something like this:
names <- c("Richard", "Mortimer", "Elizabeth", "Jeremiah")
posts <- c("I'm trying to find a solution for a problem with my neighbour, she keeps mowing the lawn on sundays when I'm trying to sleep in from my night shift", "Personally, I like to deal with annoying neighbours by just straight up confronting them. Don't shy away. There are always ways to work things out.", "Personally, I like to deal with annoying neighbours by just straight up confronting them. Don't shy away. There are always ways to work things out. That sounds quite aggressive. How about just talking to them in a friendly way, first?", "That sounds quite aggressive. How about just talking to them in a friendly way, first? Didn't mean to sound aggressive, rather meant just being straightforward, if that makes any sense")
duplicateposts <- data.frame(names, posts)
posts2 <- c("I'm trying to find a solution for a problem with my neighbour, she keeps mowing the lawn on sundays when I'm trying to sleep in from my night shift", "Personally, I like to deal with annoying neighbours by just straight up confronting them. Don't shy away. There are always ways to work things out.", "That sounds quite aggressive. How about just talking to them in a friendly way, first?", "Didn't mean to sound aggressive, rather meant just being straightforward, if that makes any sense")
postsnoduplicates <- data.frame(names, posts2)