I am having trouble downloading nltk's punkt tokenizer

Question

I'm trying to download punkt, but I'm getting the following error...

>>> import nltk
>>> nltk.download('punkt')
>>> [nltk_data] Error loading punkt: <urlopen error [SSL] unknown error
>>> [nltk_data]     (_ssl.c:590)>
>>> False
>>>

can someone please help I've been trying for days...

which version of python you are using? – Wasi Ahmad Dec 03 '16 at 06:16 — Wasi Ahmad, Dec 03 '16 at 06:16

score 7 · Accepted Answer · edited May 23 '17 at 12:08

7

I guess the downloader script is broken. As a temporal workaround can manually download the punkt tokenizer from here and then place the unzipped folder in the corresponding location. The default folders for each OS are:

Windows: C:\nltk_data\tokenizers
OSX: /usr/local/share/nltk_data/tokenizers
Unix: /usr/share/nltk_data/tokenizers

I am not sure but you may find this post helpful.

edited May 23 '17 at 12:08

Community

1
1

answered Dec 03 '16 at 06:19

Wasi Ahmad

35,739
32
114
161

thank you for your help, i can't seem to be able to download the Punkt Tokenizer Models, nothing happens? – Blind_Lizard Dec 03 '16 at 16:13
what!! you can't download it from website as well? what problem you are facing? – Wasi Ahmad Dec 03 '16 at 18:35
i don't know what to say, i have downloaded to check if there is any problem but it worked. not sure why you are facing problem? can you download other things from web? try clearing the cache of your browser! – Wasi Ahmad Dec 03 '16 at 19:01
I've cleared cache but still no luck, other things seem to download fine. I can't download it through other browsers either – Blind_Lizard Dec 03 '16 at 19:26
I have shared it in my google drive - https://drive.google.com/file/d/0Bwdcsp1RitZbN1RvVnRTV3NqdEU/view?usp=sharing. try to download it. once you are done, let me know. i will remove that :) – Wasi Ahmad Dec 03 '16 at 19:33
one more thing... i need Averaged Perceptron Tagger as well but i can't download it – Blind_Lizard Dec 03 '16 at 19:47
best way to thank is accepting the answer :) by the way, I have shared the perceptron tagger here - https://drive.google.com/file/d/0Bwdcsp1RitZbZXRUZGRra3hQYjg/view?usp=sharing – Wasi Ahmad Dec 03 '16 at 20:26
chunkers/maxent_ne_chunker/english_ace_binary.pickle? – Blind_Lizard Dec 03 '16 at 20:37
download Named Entity Chunker - https://drive.google.com/file/d/0Bwdcsp1RitZbVUo2TXZHbUlnMGc/view?usp=sharing – Wasi Ahmad Dec 03 '16 at 20:41
thanks, what folder would i put that in? this seems more complicated that it should be because i can't download things myself – Blind_Lizard Dec 03 '16 at 20:42
everything should be inside `nltk_folder` - as far as i know :) – Wasi Ahmad Dec 03 '16 at 20:43
corpora/words? could you get this for me? – Blind_Lizard Dec 03 '16 at 20:47
I have shared the entire nltk_data folder which is installed in my machine here - https://drive.google.com/file/d/0Bwdcsp1RitZbYkdfTDN5Zm9UdTQ/view?usp=sharing. i can't download and share anything more than that. – Wasi Ahmad Dec 03 '16 at 21:31
how did you download the whole file as one thing? – Blind_Lizard Dec 03 '16 at 21:36
`import nltk` and then `nltk.download()` which is not working for you unfortunately. – Wasi Ahmad Dec 03 '16 at 21:46

DjangoNoob · Answer 2 · 2020-07-25T18:28:05.373

2

Though this is an old question, I had the same issue on my mac today. The solution here helped me solve it.

Edit:

Run the following command on the OSX before running nltk.download():

/Applications/Python\ PYTHON_VERSION_HERE/Install\ Certificates.command

edited Jul 25 '20 at 18:28

answered Jul 25 '20 at 17:20

DjangoNoob

115
3
11

1

The solution mentioned in the GitHub seems to be ok, but links happen to expire so it's better to avoid _link-only_ answers. Please edit your answer to cite most important details.of the solution. Thanks! – Nowhere Man Jul 25 '20 at 18:16

score 0 · Answer 3 · answered Dec 21 '22 at 03:46

Here is detailed instruction to install punkt manually if nltk.download() doesn't work for you.

Context: I tried to use nltk.word_tokenize() and it throwed the error:

LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt/english.pickle

  Searched in:
    - 'C:\\Users\\username/nltk_data'
    - 'C:\\Users\\username\\anaconda3\\envs\\conda-env\\nltk_data'

Solution: to download the package manually.

Step 1: Look up corresponding corpus in http://www.nltk.org/nltk_data/. For example, it's Punkt Tokenizer Models in this case; click download and store in one of the folder mentioned above (if nltk_data folder does not exist, create one). For me, I picked 'C:\Users\username/nltk_data'.

Step 2: Notice that it said "Attempted to load tokenizers/punkt/english.pickle", that means you must create the same folder structure. I created "tokenizers" folder inside "nltk_data", then copy the unzipped content inside and ensure the file path "C:/Users/username/nltk_data/tokenizers/punkt/english.pickle" valid.

I am having trouble downloading nltk's punkt tokenizer

3 Answers3