Python NLTK error: english.pickle resource in NLTK not found

Question

Trying to learn NLP and Sentiment Analysis in Python and came across the NLTK. Did a few tutorials but got stuck on the tokenization function as it does not work on me (command line is saying that I don't have the resources).

I already tried installing punkt and although that was downloaded, the command line still presents the same error:

Resource u'taggers/maxent_treebank_pos_tagger/english.pickle'
not found.  Please use the NLTK Downloader to obtain the
resource:  >>> nltk.download()
Searched in:
  - 'C:\\Users\\JeromePogi/nltk_data'
  - 'C:\\nltk_data'
  - 'D:\\nltk_data'
  - 'E:\\nltk_data'
  - 'C:\\Python27\\nltk_data'
  - 'C:\\Python27\\lib\\nltk_data'
  - 'C:\\Users\\JeromePogi\\AppData\\Roaming\\nltk_data'
  - u''

I've literally tried everything including putting the nltk_data folder in each of the directories that it has searched in but to no avail. What can I do to resolve this error?

possible duplicate of [Failed loading english.pickle with nltk.data.load](http://stackoverflow.com/questions/4867197/failed-loading-english-pickle-with-nltk-data-load) — alvas, Sep 13 '15 at 03:56
Not a duplicate, it is missing a different resource. @Alvas, it's enough to recommend `nltk.download('book')` as a catch-all if you don't know the specific resource that is missing. (Or to avoid similar problems in the future.) — alexis, Sep 13 '15 at 14:28
`download('book')` includes enough batteries to last most users forever. — alexis, Sep 14 '15 at 16:05
@JermoeIbanez, would you maybe accept an answer to mark the question as resolved? This will help other users as well. — MERose, Apr 14 '16 at 13:24

score 6 · Answer 1 · answered Sep 12 '15 at 13:13

6

try installing "maxent_treebank_pos_tagger" using nltk.download() in the python console

answered Sep 12 '15 at 13:13

markos.aivazoglou

174
1
12

1

The possible duplicate (which is from 3-4 years ago) suggests `nltk.download('punkt')` - what is the difference? – Darren Cook Sep 13 '15 at 10:21
1

The `punkt` tokenizer model is used to break up plain text into sentences, and is used when the NLTK reads plain text corpora. For POS tagging, you need the maxent tagger's model. – alexis Sep 13 '15 at 14:26

score 4 · Answer 2 · answered Jan 20 '16 at 02:59

4

From the shell/terminal/cmd, you can use:

python -m nltk.downloader maxent_treebank_pos_tagger

(might need to be sudo on Linux)

It will install maxent_treebank_pos_tagger (i.e. the standard treebank POS tagger in NLTK) and fix your issue.

answered Jan 20 '16 at 02:59

Franck Dernoncourt

77,520
72
342
501

score 0 · Answer 3 · answered Mar 26 '18 at 19:08

0

In my case the problem was that I didn't properly realise, how to pass language as a parameter. My code was:

word_tokenize('So was he doing.', 'en')

Which is WRONG. Use full language names.

answered Mar 26 '18 at 19:08

soshial

5,906
6
32
40

Python NLTK error: english.pickle resource in NLTK not found

3 Answers3