I'm trying to split a piece sample text into a list of sentences without delimiters and no spaces at the end of each sentence.
Sample text:
The first time you see The Second Renaissance it may look boring. Look at it at least twice and definitely watch part 2. It will change your view of the matrix. Are the human people the ones who started the war? Is AI a bad thing?
Into this (desired output):
['The first time you see The Second Renaissance it may look boring', 'Look at it at least twice and definitely watch part 2', 'It will change your view of the matrix', 'Are the human people the ones who started the war', 'Is AI a bad thing']
My code is currently:
def sent_tokenize(text):
sentences = re.split(r"[.!?]", text)
sentences = [sent.strip(" ") for sent in sentences]
return sentences
However this outputs (current output):
['The first time you see The Second Renaissance it may look boring', 'Look at it at least twice and definitely watch part 2', 'It will change your view of the matrix', 'Are the human people the ones who started the war', 'Is AI a bad thing', '']
Notice the extra '' on the end.
Any ideas on how to remove the extra '' at the end of my current output?