Suppose I have a string such as
'I hate *some* kinds of duplicate. This string has a duplicate phrase, duplicate phrase.'
I want to remove the second occurrence of duplicate phrase
without removing other occurrences of its constituent parts, such as the other use of duplicate
.
Moreover, I need to remove all potential duplicate phrases, not just the duplicates of some specific phrase that I know in advance.
I have found several posts on similar problems, but none that have helped me solve my particular issue:
I had hoped to adapt the approach from the last link there (re.sub(r'\b(.+)(\s+\1\b)+', r'\1', s)
) for my purposes, but could not figure out how to do so.
How do I remove all arbitrary duplicate phrases of two or more words from a string in Python?