'PunktSentenceTokenizer' object is not callable

Question

I am new to pyhton and nltk.I want to tokenize a string and add a few string to the split list in nltk.I used the code from the post How to tweak the NLTK sentence tokenizer. Below is the code which I have written

from nltk.tokenize import sent_tokenize
extra_abbreviations = ['\n']
sentence_tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
sentence_tokenizer._params.abbrev_types.update(extra_abbreviations)

sent_tokenize_list = sentence_tokenizer(document)
sent_tokenize_list

This gives me the following error:

TypeError Traceback (most recent call last) in () 4 sentence_tokenizer._params.abbrev_types.update(extra_abbreviations) 5 ----> 6 sent_tokenize_list = sentence_tokenizer(document) 7 sent_tokenize_list

TypeError: 'PunktSentenceTokenizer' object is not callable

How do I fix this?

Hopefully, this helps: http://stackoverflow.com/a/35279885/610569 and https://github.com/alvations/DLTK/blob/master/dltk/tokenize/tokenizer.py#L49 — alvas, May 09 '16 at 08:48

score 2 · Answer 1 · answered May 10 '16 at 08:09

This makes your example work:

import nltk
from nltk.tokenize import sent_tokenize
extra_abbreviations = ['\n']
sentence_tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
sentence_tokenizer._params.abbrev_types.update(extra_abbreviations)
document = """This is my test doc. It has two sentences; however, one of wich with interesting punctuation."""
sent_tokenize_list = sentence_tokenizer.tokenize(document)
print(sent_tokenize_list)

Your error is due to the fact that sentence_tokenizer is an object. You have to call the function tokenize on the object.

Learn how to find out more about the capabilities of objects in the python docs

'PunktSentenceTokenizer' object is not callable

This gives me the following error:

1 Answers1