6

One can download NLTK corpora punkt and wordnet via the command line:

python3 -m nltk.downloader punkt wordnet

How can I download NLTK corpora via requirements.txt using pip install -r requirements.txt?

For example one can download spacy models requirements.txt using pip install -r requirements.txt by adding the URL of the model (e.g. https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm==2.0.0 in requirements.txt)

Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501

3 Answers3

5

How can I download NLTK corpora via requirements.txt

Short answer: no way.

The URL for spacy models points to a Python package (setup.py and all that) so it can be downloaded and installed by pip. There are no such pip-installable packages for NLTK data. nltk.downloader downloads data in its own format.

phd
  • 82,685
  • 13
  • 120
  • 165
5

There is no way to actually do this via a requirements.txt file. However, if it is necessary for you to use NLTK for wordnet and punkt what you can do is have 2 files. And download the nltk data in one and import that file into your main file. For example,

nltkmodules.py:

import nltk

nltk.download('wordnet')
nltk.download('punkt')

main.py:

import nltkmodules

# Rest of Code goes here

In your requirements.txt, you can just include:

nltk==3.5
Samrat Sahoo
  • 565
  • 8
  • 17
0

Download using commandline:

python -m nltk.downloader stopwords punkt wordnet
gndps
  • 461
  • 3
  • 13
  • Thanks, how can I download NLTK corpora **via** `requirements.txt` using `pip install -r requirements.txt`? – Franck Dernoncourt Nov 10 '22 at 00:06
  • 1
    There's no straight forward way as requirement.txt expects python package names hosted on pypi repositories, or a local package. You could however create a `requirements_nltk.txt` and create a custom python package (local or hosted), that contains the actual files of nltk package. Then run `pip install --download=/user/home/nltk/ -r requirements_nltk.txt` and while using the package, use, configure the `nltk data dir` (https://stackoverflow.com/questions/3522372/how-to-config-nltk-data-directory-from-code) Super hacky i know – gndps Nov 10 '22 at 00:23