I use from nltk.tokenize import word_tokenize
which needs punkt
. In code you can download it with nltk.download('punkt')
.
I do have nltk
as a requirement, but there is no target nltk[punkt]
. Is there another way I set this in my setup.py
as a requirement? What is the recommended way of dealing with this data dependency of nltk?
Current "solution"
Currently, I just call nltk.download('punkt')
within the function ... hence every single time I call this function, it is slowed down.