1

I am using the python NLTK for a Django app. Locally I have the NLTK corpora downloaded and things work fine. For heroku, I tried putting the corpora onto the filesystem and pushing to heroku (as described here: LookupError: Resource 'corpora/stopwords' not found) but this exceeded the 1GB limit that heroku has.

Now I've added the corpora to an AWS S3 bucket, but can't figure out how to import the nltk data into the django app. How would I do this? Thanks!

Community
  • 1
  • 1
pthamm
  • 1,841
  • 1
  • 14
  • 17

1 Answers1

1

The way to do it was to make the S3 bucket public and then use the corresponding url for getting the object needed.

For example:

pos_tagger = nltk.data.load("http://<your S3 bucket with the nltk data>.s3.amazonaws.com/nltk_data/taggers/maxent_treebank_pos_tagger/english.pickle")
pthamm
  • 1,841
  • 1
  • 14
  • 17