lets do it with regex to have more problems
df.text = df.text.str.replace(r"(?<=[.!?])[^.!?]*:\s*$", "", regex=True)
now df.text.tolist()
is
['Trump met with Putin.',
'New movie by Christopher Nolan!',
'Campers: Get ready to stop COVID-19 in its tracks!',
'London was building a bigger rival to the Eiffel Tower. Then it all went wrong.',
"I don't want to do a national lockdown again. If #coronavirus continues to 'progress' in the UK."]
variable lookbehind ftw
On regex:
(?<=[.!?])
This is a "lookbehind". It doesnt physically match anything but asserts something, which is that there must be something before what follows this. That something happens to be a character class here [.!?]
which means either . or ! or ?.
[^.!?]*
Again we have a character class with square brackets. But now we have a caret ^
as the first which means that we want everything except those in the character class. So any character other than . or ! or ? will do.
The *
after the character class is 0-or-more quantifier. Meaning, the "any character but .?!" can be found as many times as possible.
So far, we start matching either . or ? or !, and this character is behind a stream of characters which could be "anything but .?!". So we assured we match after the last sentence with this "anything but" because it can't match .?! on the way anymore.
:\s*$
With :
, we say that the 0-or-more stream above is to stop whenever it sees :
(if ever; if not, no replacement happens as desired).
The \s*
after it is to allow some possible (again, 0 or more due to *) spaces (\s means space) after the :
. You can remove that if you are certain there shall not be any space after :
.
Lastly we have $
: this matches the end of string (nothing physical, but positional). So we are sure that the string ends with :
followed optionally by some spaces.