1

I followed everything on this thread, yet I was unable to use NLTK on Google App Engine.

I desperately need NLTK on GAE, please help. I am facing the following problem.

>>> import nltk
>>> sentence = """At eight o'clock on Thursday morning
... Arthur didn't feel very good."""
>>> tokens = nltk.word_tokenize(sentence)
>>> tokens
['At', 'eight', "o'clock", 'on', 'Thursday', 'morning', '...', 'Arthur', 'did', "n't", 'feel', 'very', 'good', '.']
>>> tagged = nltk.pos_tag(tokens)

Traceback (most recent call last):
  File "<pyshell#4>", line 1, in <module>
    tagged = nltk.pos_tag(tokens)
  File "C:\Python27\lib\site-packages\nltk\tag\__init__.py", line 99, in pos_tag
    tagger = load(_POS_TAGGER)
  File "C:\Python27\lib\site-packages\nltk\data.py", line 605, in load
    resource_val = pickle.load(_open(resource_url))
  File "C:\Python27\lib\site-packages\nltk\data.py", line 686, in _open
    return find(path).open()
  File "C:\Python27\lib\site-packages\nltk\data.py", line 467, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource 'taggers/maxent_treebank_pos_tagger/english.pickle' not
  found.  Please use the NLTK Downloader to obtain the resource:
  >>> nltk.download()
  Searched in:
    - 'C:\\Users\\Anshu/nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
    - 'C:\\Python27\\nltk_data'
    - 'C:\\Python27\\lib\\nltk_data'
    - 'C:\\Users\\Anshu\\AppData\\Roaming\\nltk_data'
**********************************************************************
>>> 
shad0w_wa1k3r
  • 12,955
  • 8
  • 67
  • 90
tenstar
  • 9,816
  • 9
  • 24
  • 45
  • Possible duplicate of [1](http://stackoverflow.com/questions/14089887/nltk-pos-tag-usage) & [2](http://stackoverflow.com/questions/4867197/failed-loading-english-pickle-with-nltk-data-load) – shad0w_wa1k3r Nov 03 '13 at 03:48
  • 2
    @AshishNitinPatil Nope, don't think it's a duplicate of those. This one is GAE-specific. HOWEVER, it is a possible duplicate of this [link](http://stackoverflow.com/questions/1286301/using-the-python-nltk-2-0b5-on-the-google-app-engine?rq=1). – Truerror Nov 03 '13 at 06:53
  • You have a bunch of issues, nltk lib must be installed in the appengine project directory so it can be deployed with the code. So anything you do - ie you shell example above is not relevant to appengine installation. Secondly any other resource nltk needs will also need to be manually installed in your appengine project. – Tim Hoffman Nov 03 '13 at 08:31

2 Answers2

1

In case somebody is looking for a quick answer (the tokenizer in English is really small so it fits on the google app engine):

  1. download Punkt Tokenizer Models from http://www.nltk.org/nltk_data
  2. create a directory named /nltk_data/tokenizers/punkt/PY3 where your app.yaml is located
  3. extract english.pickle from the PY3 directory in Punkt Tokenizer Models file (punkt.zip)
  4. copy english.pickle to ./nltk_data/tokenizers/punkt/PY3/
  5. simply add the following lines in the app.yaml: env_variables: NLTK_DATA: './nltk_data/'
chabir
  • 2,108
  • 1
  • 11
  • 13
0

1) Go to your cloud console

2) Run the following commands:

pip install -U textblob
python -m textblob.download_corpora

It will download the nltk data with the corpora. Now just deploy your app again and it will work.

Manish Sharma
  • 142
  • 1
  • 11