3

Trying to learn NLP and Sentiment Analysis in Python and came across the NLTK. Did a few tutorials but got stuck on the tokenization function as it does not work on me (command line is saying that I don't have the resources).

I already tried installing punkt and although that was downloaded, the command line still presents the same error:

Resource u'taggers/maxent_treebank_pos_tagger/english.pickle'
not found.  Please use the NLTK Downloader to obtain the
resource:  >>> nltk.download()
Searched in:
  - 'C:\\Users\\JeromePogi/nltk_data'
  - 'C:\\nltk_data'
  - 'D:\\nltk_data'
  - 'E:\\nltk_data'
  - 'C:\\Python27\\nltk_data'
  - 'C:\\Python27\\lib\\nltk_data'
  - 'C:\\Users\\JeromePogi\\AppData\\Roaming\\nltk_data'
  - u''

I've literally tried everything including putting the nltk_data folder in each of the directories that it has searched in but to no avail. What can I do to resolve this error?

orome
  • 45,163
  • 57
  • 202
  • 418
Jerome Ibañez
  • 31
  • 1
  • 1
  • 2
  • 1
    possible duplicate of [Failed loading english.pickle with nltk.data.load](http://stackoverflow.com/questions/4867197/failed-loading-english-pickle-with-nltk-data-load) – alvas Sep 13 '15 at 03:56
  • 1
    `import nltk; nltk.download('all')` – alvas Sep 13 '15 at 03:57
  • Not a duplicate, it is missing a different resource. @Alvas, it's enough to recommend `nltk.download('book')` as a catch-all if you don't know the specific resource that is missing. (Or to avoid similar problems in the future.) – alexis Sep 13 '15 at 14:28
  • I like the "batteries-included" solutions =) – alvas Sep 13 '15 at 21:41
  • `download('book')` includes enough batteries to last most users forever. – alexis Sep 14 '15 at 16:05
  • @JermoeIbanez, would you maybe accept an answer to mark the question as resolved? This will help other users as well. – MERose Apr 14 '16 at 13:24

3 Answers3

6

try installing "maxent_treebank_pos_tagger" using nltk.download() in the python console

  • 1
    The possible duplicate (which is from 3-4 years ago) suggests `nltk.download('punkt')` - what is the difference? – Darren Cook Sep 13 '15 at 10:21
  • 1
    The `punkt` tokenizer model is used to break up plain text into sentences, and is used when the NLTK reads plain text corpora. For POS tagging, you need the maxent tagger's model. – alexis Sep 13 '15 at 14:26
4

From the shell/terminal/cmd, you can use:

python -m nltk.downloader maxent_treebank_pos_tagger

(might need to be sudo on Linux)

It will install maxent_treebank_pos_tagger (i.e. the standard treebank POS tagger in NLTK) and fix your issue.

Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501
0

In my case the problem was that I didn't properly realise, how to pass language as a parameter. My code was:

word_tokenize('So was he doing.', 'en')

Which is WRONG. Use full language names.

soshial
  • 5,906
  • 6
  • 32
  • 40