I am working with nltk's default tagger to get a POS tag of the word but I am not getting the expected results:
>>> nltk.pos_tag(nltk.tokenize.word_tokenize("I want a watch"))
[('I', 'PRP'), ('want', 'VBP'), ('a', 'DT'), ('watch', 'NN')]
>>> nltk.pos_tag(nltk.tokenize.word_tokenize("Lets watch a movie"))
[('Lets', 'NNS'), ('watch', 'VBP'), ('a', 'DT'), ('movie', 'NN')]
As you can see above, the pos_tag
function correctly tags the word watch
. But in the below case:
>>> nltk.pos_tag(nltk.tokenize.word_tokenize("I want to read a book"))
[('I', 'PRP'), ('want', 'VBP'), ('to', 'TO'), ('read', 'VB'), ('a', 'DT'), ('book', 'NN')]
>>> nltk.pos_tag(nltk.tokenize.word_tokenize("I want to book a ticket"))
[('I', 'PRP'), ('want', 'VBP'), ('to', 'TO'), ('book', 'NN'), ('a', 'DT'), ('ticket', 'NN')]
It incorrectly predicts the tag for the word book
.
I know we can build a custom tagger but I would not prefer build a tagger from scratch just for one word. I am looking to improve the accuracy of the tagger for the word book
. I referred to this answer but the latest version doesn't seem to have the method nltk.tag._POS_TAGGER
.
Is there any possible workaround for this?