I am using setuptools for packaging my module which uses a trained dataset for an AI application. In particular, I'm using vader_lexicon.txt
found in the nltk data files.
When you install nltk from pip, it does not automatically download datasets for you. You have to manually execute commands either from your commandline or your python interpreter to grab particular datasets. I suspect they did this because including all datasets would be hundreds of megabytes.
From documentation on setuptools, it appears that the way to include data files is to use MANIFEST.in
, but it looks like you need to include your data files with your source distribution in this case.
Is there any way to include data that comes from a remote location?