Heroku Django app using NLTK: How do I use the NLTK corpora in the app?

Question

I am using the python NLTK for a Django app. Locally I have the NLTK corpora downloaded and things work fine. For heroku, I tried putting the corpora onto the filesystem and pushing to heroku (as described here: LookupError: Resource 'corpora/stopwords' not found) but this exceeded the 1GB limit that heroku has.

Now I've added the corpora to an AWS S3 bucket, but can't figure out how to import the nltk data into the django app. How would I do this? Thanks!

https://devcenter.heroku.com/articles/python-nltk – Kenneth Reitz Mar 17 '17 at 19:07 — Kenneth Reitz, Mar 17 '17 at 19:07

score 1 · Accepted Answer · answered Sep 30 '15 at 17:34

The way to do it was to make the S3 bucket public and then use the corresponding url for getting the object needed.

For example:

pos_tagger = nltk.data.load("http://<your S3 bucket with the nltk data>.s3.amazonaws.com/nltk_data/taggers/maxent_treebank_pos_tagger/english.pickle")

Heroku Django app using NLTK: How do I use the NLTK corpora in the app?

1 Answers1