Hey i'd like to install the NLTK pos_tag on my Heroku server. How can i do so. Please give me the steps as im new to the Heroku server system.
-
1Try this answer http://stackoverflow.com/a/26334947/181337 – Facundo Casco Sep 08 '15 at 20:08
-
http://stackoverflow.com/questions/18385303/how-to-install-nltk-modules-in-heroku/35895877 – Kenneth Reitz Mar 17 '17 at 19:06
-
1https://devcenter.heroku.com/articles/python-nltk – Kenneth Reitz Mar 17 '17 at 19:06
4 Answers
I just added official nltk
support to the buildpack!
Simply add a nltk.txt
file with a list of corpora you want installed, and everything should work as expected.

- 8,559
- 4
- 31
- 34
Update
As Kenneth Reitz pointed out, a much simpler solution has been added to the heroku-python-buildpack. Add a nltk.txt
file to your root directory and list your corpora inside. See https://devcenter.heroku.com/articles/python-nltk for details.
Original Answer
Here's a solution that allows you to install the NLTK data directly on Heroku without adding it to your git repo.
I used similar steps to install Textblob on Heroku, which uses NLTK as a dependency. I've made some minor adjustments to my original code in steps 3 and 4 that should work for an NLTK only installation.
The default heroku buildpack includes a post_compile
step that runs after all of the default build steps have been completed:
# post_compile
#!/usr/bin/env bash
if [ -f bin/post_compile ]; then
echo "-----> Running post-compile hook"
chmod +x bin/post_compile
sub-env bin/post_compile
fi
As you can see, it looks in your project directory for your own post_compile
file in the bin
directory, and it runs it if it exists. You can use this hook to install the nltk data.
Create the
bin
directory in the root of your local project.Add your own
post_compile
file to thebin
directory.# bin/post_compile #!/usr/bin/env bash if [ -f bin/install_nltk_data ]; then echo "-----> Running install_nltk_data" chmod +x bin/install_nltk_data bin/install_nltk_data fi echo "-----> Post-compile done"
Add your own
install_nltk_data
file to thebin
directory.# bin/install_nltk_data #!/usr/bin/env bash source $BIN_DIR/utils echo "-----> Starting nltk data installation" # Assumes NLTK_DATA environment variable is already set # $ heroku config:set NLTK_DATA='/app/nltk_data' # Install the nltk data # NOTE: The following command installs the averaged_perceptron_tagger corpora, # so you may want to change for your specific needs. # See http://www.nltk.org/data.html python -m nltk.downloader averaged_perceptron_tagger # If using Textblob, use this instead: # python -m textblob.download_corpora lite # Open the NLTK_DATA directory cd ${NLTK_DATA} # Delete all of the zip files find . -name "*.zip" -type f -delete echo "-----> Finished nltk data installation"
Add
nltk
to yourrequirements.txt
file (Ortextblob
if you are using Textblob).Commit all of these changes to your repo.
Set the NLTK_DATA environment variable on your heroku app.
$ heroku config:set NLTK_DATA='/app/nltk_data'
Deploy to Heroku. You will see the
post_compile
step trigger at the end of the deployment, followed by the nltk download.
I hope you found this helpful! Enjoy!

- 1
- 1

- 880
- 11
- 18
-
I'm getting this error after following your instructions. - import bp_cli ImportError: No module named bp_cli ! Push rejected, failed to compile Python app. ! Push failed – 221B Aug 07 '16 at 08:54
-
Hi @Sainath. I'm not familiar with your error, but it doesn't seem to be related to NLTK. You might be missing some requirements in your requirements.txt file for other parts of your application. – Michael Godshall Aug 08 '16 at 16:29
-
I'm also getting the same error. @Sainath did you ever figure out how to solve it? – HarshMarshmallow Nov 17 '16 at 18:39
-
1Figured it out. I didn't follow the last instruction: heroku config:set NLTK_DATA='/app/nltk_data'. Did that and it worked great! – HarshMarshmallow Nov 17 '16 at 18:42
-
Important note: heroku python build pack v97 changed behavior, causing the nltk_data directory to be omitted. See https://github.com/heroku/heroku-buildpack-python/issues/356 for fix. – Dan Grigsby Feb 14 '17 at 18:58
If you want to use simple functionalities like pos_tag, tokenizer, stemming, etc. then you can do the following steps
- mention nltk in requirements.txt
- mention following modules in nltk.txt
- wordnet
- pros_cons
- reuters
- hmm_treebank_pos_tagger
- maxent_treebank_pos_tagger
- universal_tagset
- punkt
- averaged_perceptron_tagger_ru
- averaged_perceptron_tagger
- snowball_data
- rslp
- porter_test
- vader_lexicon
- treebank
- dependency_treebank

- 21,052
- 22
- 49
- 55

- 5,741
- 3
- 17
- 24
-
-
-
How did you come up with this list? I only need `nltk.data`, `nltk.word_tokenize`, `nltk.post_tag`, `nltk.sentiment`. Do you know what I should include in the `nltk.txt` file in this case? – Alaa M. Dec 04 '21 at 21:26
-
Answering myself... If anyone needs to know the required packages to mention in `nltk.txt`, check the heroku log (go to the app in the website > More > View Logs), and see the details. E.g. if you're missing `punkt` it would say: `nltk.download('punkt')`, so you add `punkt` in `nltk.txt` etc. I ended up needing only `punkt`, `vader_lexicon`, `averaged_perceptron_tagger` – Alaa M. Dec 05 '21 at 09:07
You need to follow the below steps.
- nltk.txt needs to present at the root folder
- Add the modules you want to download like punkt, stopwords as separate row items
- Change the line ending from windows to UNIX
Changing the line ending is a very important step. Can be easily done through Sublime Text or Notepad++. In Sublime Text, it can done from the View menu, then Line Endings.
Hope this helps

- 45
- 1
- 10