I'm working on a project to parse a sentence into separate strings and then into a list in order to replace certain words. I'm trying to preserve the original punctuation so that I can reassemble the sentence with the same punctuation as before, however I don't want to include the punctuation as part of the preceding word, so that the word match logic will work.
I have tried the regex as a raw, I've tried using a + after my character class, using grouping parentheses, nothing is working. Lots of googling has revealed the problem to be a zero-length match, but I can't figure out how to change. it.
Code:
#! python
import re
SENTENCE = 'Now is the time for all good men, to come to the aid of: their country.'
splitterRegex = re.compile(r'(\w+)|[,.:;?!]')
mo = splitterRegex.findall(SENTENCE)
print(mo)
Results:
['Now', 'is', 'the', 'time', 'for', 'all', 'good', 'men', '', 'to', 'come', 'to', 'the', 'aid', 'of', '', 'their', 'country', '']