I'm trying to extract a sentence from a paragraph using regular expressions in python.
Usually the code that I'm testing extracts the sentence correctly, but in the following paragraph the sentence does not get extracted correctly.
The paragraph:
"But in the case of malaria infections and sepsis, dendritic cells throughout the body are concentrated on alerting the immune system, which prevents them from detecting and responding to any new infections." A new type of vaccine?
The code:
def splitParagraphIntoSentences(paragraph):
import re
sentenceEnders = re.compile('[.!?][\s]{1,2}(?=[A-Z])')
sentenceList = sentenceEnders.split(paragraph)
return sentenceList
if __name__ == '__main__':
f = open("bs.txt", 'r')
text = f.read()
mylist = []
sentences = splitParagraphIntoSentences(text)
for s in sentences:
mylist.append(s.strip())
for i in mylist:
print i
When tested with the above paragraph it gives output exactly as the input paragraph but the output should look like-
But in the case of malaria infections and sepsis, dendritic cells throughout the body are concentrated on alerting the immune system, which prevents them from detecting and responding to any new infections
A new type of vaccine
Is there anything wrong with the regular expression?