Here is detailed instruction to install punkt
manually if nltk.download()
doesn't work for you.
Context: I tried to use nltk.word_tokenize()
and it throwed the error:
LookupError:
**********************************************************************
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('punkt')
For more information see: https://www.nltk.org/data.html
Attempted to load tokenizers/punkt/english.pickle
Searched in:
- 'C:\\Users\\username/nltk_data'
- 'C:\\Users\\username\\anaconda3\\envs\\conda-env\\nltk_data'
Solution: to download the package manually.
Step 1: Look up corresponding corpus in http://www.nltk.org/nltk_data/. For example, it's Punkt Tokenizer Models in this case; click download and store in one of the folder mentioned above (if nltk_data
folder does not exist, create one). For me, I picked 'C:\Users\username/nltk_data'.
Step 2: Notice that it said "Attempted to load tokenizers/punkt/english.pickle", that means you must create the same folder structure. I created "tokenizers" folder inside "nltk_data", then copy the unzipped content inside and ensure the file path "C:/Users/username/nltk_data/tokenizers/punkt/english.pickle" valid.