I'd like to remove all paragraphs starting strings in a list (non-case sensitive): ["keyword", "disclosure"]
My code:
re.sub("(?i)\n(keyword|disclosure).*(\n|$)", "\n", txt)
This works fine if there is at least one paragraph between the bad paragraphs, but it does not work if there is more than one bad paragraph in a row.
For example:
Text text text
Keywords: text text, text. Texts
Disclosures of stuff text more texts
Stuff text text
Results in the subsequent bad paragraphs getting missed:
Text text text
Disclosures of stuff text more texts
Stuff text text
Instead of what I would like to see:
Text text text
Stuff text text
How can I ensure all repeated matches are also replaced? Preferably I'd also like repeated matches treated as the same match so I don't get extra newlines, but if it's much cleaner and easier to just replace repeated newlines with a newline after, that's ok.