I want to open a file and get sentences. The sentences in the file go across lines, like this:
"He said, 'I'll pay you five pounds a week if I can have it on my own
terms.' I'm a poor woman, sir, and Mr. Warren earns little, and the
money meant much to me. He took out a ten-pound note, and he held it
out to me then and there.
currently I'm using this code:
text = ' '.join(file_to_open.readlines())
sentences = re.split(r' *[\.\?!][\'"\)\]]* *', text)
readlines
cuts through the sentences, is there a good way to solve this to get only the sentences? (without NLTK)
Thanks for you attention.
The current problem:
file_to_read = 'test.txt'
with open(file_to_read) as f:
text = f.read()
import re
word_list = ['Mrs.', 'Mr.']
for i in word_list:
text = re.sub(i, i[:-1], text)
What I get back ( in the test case) is that Mrs. changed to Mr while Mr. is just Mr . I tried several other things, but don't seem to work. Answer is probably easy but I'm missing it