Is it possible to package a python module that includes an external data dependency downloaded from a URL?

Question

I am using setuptools for packaging my module which uses a trained dataset for an AI application. In particular, I'm using vader_lexicon.txt found in the nltk data files.

When you install nltk from pip, it does not automatically download datasets for you. You have to manually execute commands either from your commandline or your python interpreter to grab particular datasets. I suspect they did this because including all datasets would be hundreds of megabytes.

From documentation on setuptools, it appears that the way to include data files is to use MANIFEST.in, but it looks like you need to include your data files with your source distribution in this case.

Is there any way to include data that comes from a remote location?

I think you can have a look at answers here: https://stackoverflow.com/questions/37513279/using-setuptools-how-can-i-download-external-data-upon-installation — Nikolai K., Aug 02 '17 at 15:00
Possible duplicate of [Using setuptools, how can I download external data upon installation?](https://stackoverflow.com/questions/37513279/using-setuptools-how-can-i-download-external-data-upon-installation) — Nikolai K., Aug 02 '17 at 15:02

Is it possible to package a python module that includes an external data dependency downloaded from a URL?

0 Answers0