1

I have a django application which I have deployed using below link,

https://cloud.google.com/python/django/flexible-environment

But as I am using nltk for text processing, I am getting below error.

*********************************************************************
  Resource 'taggers/maxent_treebank_pos_tagger/PY3/english.pickle'
  not found.  Please use the NLTK Downloader to obtain the
  resource:  >>> nltk.download()
  Searched in:
    - '/root/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''

So I know that I am missing data from nltk. I have looked tons of code online but there is no way to download data in google app engine. Below is my requirement.txt for your reference.

Django==1.10.6
gunicorn==19.7.0
nltk==3.0.5

Please let me know if there is a way to do it. Thanks in advance.

displayname
  • 317
  • 1
  • 5
  • 15
  • See https://stackoverflow.com/questions/22211525/how-do-i-download-nltk-data – alvas May 26 '17 at 01:13
  • Dear alvas, Thank you for the reply. But the post you shared is in general that how to install nltk data. I want ot install it on Google App Engine on Google Cloud. Thanks for your help though. – displayname May 26 '17 at 03:52
  • Isn't it the same, read the answers carefully, you can set the paths to where you download or read the `nltk_data` directory. Is there a static disk that you keep your assets on the app engine? Or is it a serverless backend. If it's a microserver then I think `nltk` might not function properly unless there's a cloud NAS that the app engine linked to. – alvas May 26 '17 at 04:00
  • Also, update your NLTK, the latest `nltk` should no longer be using the `maxent` model ;P v 3.0.5 is really toooo low for any serious usage, it should be v 3.2.4 – alvas May 26 '17 at 04:00
  • Yes alvas you're right. There is no static disk where I can go and modify. That is the reason I am not able download data. – displayname May 27 '17 at 02:56
  • Dear @alvas check answer a workaround to get nltk data.. – displayname May 27 '17 at 06:11
  • In general put nltk_data in your [static asset](https://docs.djangoproject.com/en/1.11/howto/static-files/) but it's actually not advisable too. Setup a NFS, you will need it in the future if you're serious in deploying NLP solutions with django. Then link the NFS with a symbolic link to your static asset directory. I hope it helps. – alvas May 27 '17 at 14:11

1 Answers1

0

I did a workaround for getting the nltk data. Firstly I copied required nltk data files into my Django app folder. In settings.py, to access that folder I create one variable.

nltk_dir = os.path.join(BASE_DIR,'first_app','nltk_data')

Then referred this directory variable where I am using nltk.data.path.append() So it basically appends to the list of the path in data.py in nltk.

url = settings.nltk_dir
nltk.data.path.append(url)

Hence, I am able to retrieve nltk data.:)

displayname
  • 317
  • 1
  • 5
  • 15