NLTK PunktSentenceTokenizer doesn't find the end of sentence properly.
from nltk.tokenize.punkt import PunktSentenceTokenizer, PunktParameters
punkt_param = PunktParameters()
punkt_param.abbrev_types.add(['dr', 'vs', 'mr', 'mrs', 'prof', 'inc', 'rev'])
sentence_splitter = PunktSentenceTokenizer(punkt_param)
sentence_splitter.tokenize(u'In that paper, "Has Financial Development Made the World Riskier?", Rajan "argued that disaster might loom." ')
Ouput:
[u'In that paper, "Has Financial Development Made the World Riskier?"',
u', Rajan "argued that disaster might loom."']
another one:
sentence_splitter.tokenize(u'Don "Don C." Crowley')
Output:
[u'Don "Don C."', u'Crowley']
Both inputs should not be split into two sentences. Is there any way to handle this?