Resource u'tokenizers/punkt/english.pickle' not found

Question

My Code:

import nltk.data
tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle')

ERROR Message:

[ec2-user@ip-172-31-31-31 sentiment]$ python mapper_local_v1.0.py
Traceback (most recent call last):
File "mapper_local_v1.0.py", line 16, in <module>

    tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle')

File "/usr/lib/python2.6/site-packages/nltk/data.py", line 774, in load

    opened_resource = _open(resource_url)

File "/usr/lib/python2.6/site-packages/nltk/data.py", line 888, in _open

    return find(path_, path + ['']).open()

File "/usr/lib/python2.6/site-packages/nltk/data.py", line 618, in find

    raise LookupError(resource_not_found)

LookupError:

Resource u'tokenizers/punkt/english.pickle' not found.  Please
use the NLTK Downloader to obtain the resource:

    >>>nltk.download()

Searched in:
- '/home/ec2-user/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- u''

I'm trying to run this program in Unix machine:

As per the error message, I logged into python shell from my unix machine then I used the below commands:

import nltk
nltk.download()

and then I downloaded all the available things using d- down loader and l- list options but still the problem persists.

I tried my best to find the solution in internet but I got the same solution what I did as I mentioned in my above steps.

possible duplicate of [Failed loading english.pickle with nltk.data.load](http://stackoverflow.com/questions/4867197/failed-loading-english-pickle-with-nltk-data-load) — alvas, Oct 26 '14 at 22:42

score 196 · Answer 1 · edited May 23 '17 at 12:18

196

To add to alvas' answer, you can download only the punkt corpus:

nltk.download('punkt')

Downloading all sounds like overkill to me. Unless that's what you want.

edited May 23 '17 at 12:18

Community

1
1

answered Dec 09 '14 at 15:01

yprez

14,854
11
55
70

Thanks for the nltk corpus name. – Austin A Jun 04 '15 at 03:31
I had to upgrade to Latest nltk version 3.2.5 to make nltk.download('punkt') work. – charles gomes Oct 03 '17 at 21:29
This is what I get after I run the code above: [nltk_data] Error loading punkt: – Paul May 10 '20 at 12:38
Regarding my previous issue, all I had to do was to turn off my VPN. – Paul May 10 '20 at 13:22

alvas · Answer 2 · 2017-07-28T03:40:37.367

72

If you're looking to only download the punkt model:

import nltk
nltk.download('punkt')

If you're unsure which data/model you need, you can install the popular datasets, models and taggers from NLTK:

import nltk
nltk.download('popular')

With the above command, there is no need to use the GUI to download the datasets.

edited Jul 28 '17 at 03:40

answered Oct 26 '14 at 22:40

alvas

115,346
109
446
738

score 37 · Accepted Answer · answered Oct 26 '14 at 17:20

37

I got the solution:

import nltk
nltk.download()

once the NLTK Downloader starts

d) Download l) List u) Update c) Config h) Help q) Quit

Downloader> d

Download which package (l=list; x=cancel)? Identifier> punkt

answered Oct 26 '14 at 17:20

Supreeth Meka

1,879
2
15
16

Franck Dernoncourt · Answer 4 · 2017-07-29T15:06:32.610

From the shell you can execute:

sudo python -m nltk.downloader punkt

If you want to install the popular NLTK corpora/models:

sudo python -m nltk.downloader popular

If you want to install all NLTK corpora/models:

sudo python -m nltk.downloader all

To list the resources you have downloaded:

python -c 'import os; import nltk; print os.listdir(nltk.data.find("corpora"))'
python -c 'import os; import nltk; print os.listdir(nltk.data.find("tokenizers"))'

score 15 · Answer 5 · answered Dec 13 '17 at 08:46

import nltk
nltk.download('punkt')

Open the Python prompt and run the above statements.

The sent_tokenize function uses an instance of PunktSentenceTokenizer from the nltk.tokenize.punkt module. This instance has already been trained and works well for many European languages. So it knows what punctuation and characters mark the end of a sentence and the beginning of a new sentence.

score 10 · Answer 6 · answered Oct 26 '14 at 08:18

10

The same thing happened to me recently, you just need to download the "punkt" package and it should work.

When you execute "list" (l) after having "downloaded all the available things", is everything marked like the following line?:

[*] punkt............... Punkt Tokenizer Models

If you see this line with the star, it means you have it, and nltk should be able to load it.

answered Oct 26 '14 at 08:18

eeelnico

101
3

1

Hey @supreeth-meka, I am glad you found the solution, it is what I suggested you, can you mark my answer as "Accepted" please? – eeelnico Oct 26 '14 at 19:09

score 7 · Answer 7 · answered Jun 11 '16 at 06:09

Go to python console by typing

$ python

in your terminal. Then, type the following 2 commands in your python shell to install the respective packages:

>> nltk.download('punkt') >> nltk.download('averaged_perceptron_tagger')

This solved the issue for me.

score 7 · Answer 8 · answered Apr 03 '20 at 22:36

7

I was getting an error despite importing the following,

import nltk
nltk.download()

but for google colab this solved my issue.

   !python3 -c "import nltk; nltk.download('all')"

answered Apr 03 '20 at 22:36

sargupta

953
13
25

score 6 · Answer 9 · edited Oct 20 '19 at 10:24

6

After adding this line of code, the issue will be fixed:

nltk.download('punkt')

edited Oct 20 '19 at 10:24

elcortegano

2,444
11
40
58

answered Oct 20 '19 at 09:29

Ankit Rai

239
3
3

score 5 · Answer 10 · answered Feb 28 '15 at 21:57

My issue was that I called nltk.download('all') as the root user, but the process that eventually used nltk was another user who didn't have access to /root/nltk_data where the content was downloaded.

So I simply recursively copied everything from the download location to one of the paths where NLTK was looking to find it like this:

cp -R /root/nltk_data/ /home/ubuntu/nltk_data

score 4 · Answer 11 · answered Apr 01 '15 at 11:48

Simple nltk.download() will not solve this issue. I tried the below and it worked for me:

in the nltk folder create a tokenizers folder and copy your punkt folder into tokenizers folder.

This will work.! the folder structure needs to be as shown in the picture

score 4 · Answer 12 · answered Aug 09 '16 at 19:44

For me nothing of the above worked, so I just downloaded all the files by hand from the web site http://www.nltk.org/nltk_data/ and I put them also by hand in a file "tokenizers" inside of "nltk_data" folder. Not a pretty solution but still a solution.

score 4 · Answer 13 · edited Apr 03 '17 at 22:39

4

Execute the following code:
```
import nltk
nltk.download()
```
After this, NLTK downloader will pop out.
Select All packages.
Download punkt.

edited Apr 03 '17 at 22:39

Tot Zam

8,406
10
51
76

answered Apr 03 '17 at 20:32

Mayank Kumar

523
6
9

score 3 · Answer 14 · edited Sep 21 '15 at 11:04

3

You need to rearrange your folders Move your tokenizers folder into nltk_data folder. This doesn't work if you have nltk_data folder containing corpora folder containing tokenizers folder

edited Sep 21 '15 at 11:04

CocoNess

4,213
4
26
43

answered Sep 21 '15 at 10:05

alily

299
2
12

score 3 · Answer 15 · edited Mar 17 '20 at 13:01

3

Just make sure you are using Jupyter Notebook and in a notebook, do the following:

import nltk

nltk.download()

Then one popup window will appear (showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml) From that you have to download everything.

Then rerun your code.

edited Mar 17 '20 at 13:01

vvvvv

25,404
19
49
81

answered Mar 17 '20 at 11:52

Divyanshu Parkhe

31
1

score 2 · Answer 16 · answered Feb 06 '18 at 21:40

I faced same issue. After downloading everything, still 'punkt' error was there. I searched package on my windows machine at C:\Users\vaibhav\AppData\Roaming\nltk_data\tokenizers and I can see 'punkt.zip' present there. I realized that somehow the zip has not been extracted into C:\Users\vaibhav\AppData\Roaming\nltk_data\tokenizers\punk. Once I extracted the zip, it worked like music.

score 2 · Answer 17 · answered Aug 01 '20 at 02:50

2

For me it got solved by using "nltk:"

http://www.nltk.org/howto/data.html

Failed loading english.pickle with nltk.data.load

sent_tokenizer=nltk.data.load('nltk:tokenizers/punkt/english.pickle')

answered Aug 01 '20 at 02:50

sakeesh

919
1
10
24

score 2 · Answer 18 · answered Feb 13 '21 at 11:01

2

Add the following lines into your script. This will automatically download the punkt data.

import nltk
nltk.download('punkt')

answered Feb 13 '21 at 11:01

Codemaker2015

12,190
6
97
81

score 0 · Answer 19 · edited May 15 '23 at 04:02

For me, this command does not work:

    import nltk
    nltk.download()

but I still have a solution to this problem, it worked for me though.

You have to manually download the punkt file yourself but while writing this the site is not working. You can download it from the archive.org:

https://web.archive.org/web/20230206063107/https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/tokenizers/punkt.zip

After downloading the file you have to go to

C:\Users\LEMOVO\AppData\Roaming\

If the nltk_data folder does not exist then create one and go inside the folder. Then create another folder named tokenizers and extract the punkt.zip file inside the tokenizers folder.

I hope this helps.

Resource u'tokenizers/punkt/english.pickle' not found

19 Answers19

once the NLTK Downloader starts

d) Download l) List u) Update c) Config h) Help q) Quit

Linked