Suppose I have a string that looks like
s = "this is a random random this is a random Sentence sentence where phrases and words words repeat. This is the the second sentence sentence of the Same same paragraph"
I want its output to be
this is a random sentence where phrases and words repeat. This is the second sentence of the same paragraph"
This is something that I have tried, it handles the repeated words and phrases but does not take care of case sensitive duplicate words like Sentence sentence
and Same same
s = "this is a random random this is a random Sentence sentence where phrases and words words repeat. This is the the second sentence sentence of the Same same paragraph"
def postprocess(s):
while re.search(r'\b(.+)(\s+\1\b)+', s):
s = re.sub(r'\b(.+)(\s+\1\b)+', r'\1', s)
return s
postprocess(s)
the output it returns is
this is a random this is a random Sentence sentence where phrases and words repeat. This is the second sentence of the Same same paragraph
can anyone help me here?