I am developing a regex to find sentences, and I would like to ignore abbreviations that cause the regex to terminate before the end of the sentence. For example, I want to ignore "a.m." so that it returns "At 9:00 a.m. the store opens." instead of "At 9:00 a.m."
def sentence_finder(x):
RegexObject = re.compile(r'[A-Z].+?\b(?!a\.m\.\b)\w+[.?!](?!\S)')
Variable = RegexObject.findall(x)
return Variable
I get back the following when I run pytest:
def test_pass_Ignore_am():
> assert DuplicateSentences.sentence_finder("At 9:00 a.m. the store opens.") == ["At 9:00 a.m. the store opens."]
E AssertionError: assert ['At 9:00 a.m.'] == ['At 9:00 a.m...store opens.']
E At index 0 diff: 'At 9:00 a.m.' != 'At 9:00 a.m. the store opens.'
What am I doing wrong?