I'm trying to split a text into sentences, whenever a terminal punctuation mark ( '.', '!', '?') appears. for instance if I have the following text :
Recognizing the rising opportunity Jerusalem Venture Partners opened up their Cyber Labs incubator, giving a home to many of the city’s promising young companies. International corporates like EMC have also established major centers in the park, leading the way for others to follow! On a visit last June, the park had already grown to two buildings with the ground being broken for the construction of more in the near future. this is really interesting! what do you think?
This should be splitted into 5 sentences (see the bold words above, as these words end with a punctuation mark).
Here's my code:
# split on: '.+'
splitted_article_content = []
# article_content contains all the article's paragraphs
for element in article_content:
splitted_article_content = splitted_article_content +re.split(".(?='.'+)", element)
# split on: '?+'
splitted_article_content_2 = []
for element in splitted_article_content:
splitted_article_content_2 = splitted_article_content_2 + re.split(".(?='?'+)", element)
# split on: '!+'
splitted_article_content_3 = []
for element in splitted_article_content_2:
splitted_article_content_3 = splitted_article_content_3 + re.split(".(?='!'+)", element)
My question is, is there any other efficient way to do the following, without using any external libraries ?
Thanks for the help guys.