I am working on splitting paragraph into sentences.
I googled and found that nltk mostly works well with splitting sentences, but I found one problem.
import nltk
sent_detector = nltk.data.load('tokenizers/punkt/english.pickle')
summary = 'George Stanley McGovern (July 19, 1922 – October 21, 2012) was an American historian, author, U.S. Representative, U.S. Senator, and the Democratic Party presidential nominee in the 1972 presidential election.'
summary = (sent_detector.tokenize(summary))
The result should be just one sentence. However, it returns two sentences.
['George Stanley McGovern (July 19, 1922 \x96 October 21, 2012) was an American historian, author, U.S. Representative, U.S.', 'Senator, and the Democratic Party presidential nominee in the 1972 presidential election.']