How do I download NLTK data?

Question

Updated answer:NLTK works for 2.7 well. I had 3.2. I uninstalled 3.2 and installed 2.7. Now it works!!

I have installed NLTK and tried to download NLTK Data. What I did was to follow the instrution on this site: http://www.nltk.org/data.html

I downloaded NLTK, installed it, and then tried to run the following code:

>>> import nltk
>>> nltk.download()

It gave me the error message like below:

Traceback (most recent call last):
  File "<pyshell#6>", line 1, in <module>
    nltk.download()
AttributeError: 'module' object has no attribute 'download'
 Directory of C:\Python32\Lib\site-packages

Tried both nltk.download() and nltk.downloader(), both gave me error messages.

Then I used help(nltk) to pull out the package, it shows the following info:

NAME
    nltk

PACKAGE CONTENTS
    align
    app (package)
    book
    ccg (package)
    chat (package)
    chunk (package)
    classify (package)
    cluster (package)
    collocations
    corpus (package)
    data
    decorators
    downloader
    draw (package)
    examples (package)
    featstruct
    grammar
    help
    inference (package)
    internals
    lazyimport
    metrics (package)
    misc (package)
    model (package)
    parse (package)
    probability
    sem (package)
    sourcedstring
    stem (package)
    tag (package)
    test (package)
    text
    tokenize (package)
    toolbox
    tree
    treetransforms
    util
    yamltags

FILE
    c:\python32\lib\site-packages\nltk

I do see Downloader there, not sure why it does not work. Python 3.2.2, system Windows vista.

Short note: I do not know what the problem is, but what you are doing is correct and should give you a GUI to choose what to download (i.e. you're not doing it wrong, but something *is* wrong) — Miquel, Mar 05 '14 at 23:24
From where did you install NLTK? I highly suggest you install it through a package manager like [pip](https://pypi.python.org/pypi/pip) to handle all the dependencies for you. — Michael Aquilina, Mar 05 '14 at 23:30
I am not sure how to do it. Do you mean I should install pip first and then use it to install NLTK? — Q-ximi, Mar 05 '14 at 23:35
I am not sure how to do it. Do you mean I should install pip first and then use it to install NLTK? I found this resource: [link](http://www.pip-installer.org/en/latest/installing.html#id5) Should I just copy paste the content in `get-pip.py` link to a python file, save it to `c:/python32`? I would really appreciate more information on the details. Thanks. — Q-ximi, Mar 05 '14 at 23:38
Just follow the commands on the [website](http://www.pip-installer.org/en/latest/installing.html) to install pip (you will need to use the command-prompt window to do so). Once its installed, use a command-prompt window and type "pip install nltk". Once that's done try running the nltk command again to see if you pip resolved your issue. — Michael Aquilina, Mar 05 '14 at 23:43
So I am not seeing anything to download except that huge blob. It did not see if I should copy paste that text or do something else before I run the below it, or should I copy paste the whole thing and save it somewhere? — Q-ximi, Mar 05 '14 at 23:50

alvas · Answer 1 · 2018-02-06T01:46:51.133

TL;DR

To download a particular dataset/models, use the nltk.download() function, e.g. if you are looking to download the punkt sentence tokenizer, use:

$ python3
>>> import nltk
>>> nltk.download('punkt')

If you're unsure of which data/model you need, you can start out with the basic list of data + models with:

>>> import nltk
>>> nltk.download('popular')

It will download a list of "popular" resources, these includes:

<collection id="popular" name="Popular packages">
      <item ref="cmudict" />
      <item ref="gazetteers" />
      <item ref="genesis" />
      <item ref="gutenberg" />
      <item ref="inaugural" />
      <item ref="movie_reviews" />
      <item ref="names" />
      <item ref="shakespeare" />
      <item ref="stopwords" />
      <item ref="treebank" />
      <item ref="twitter_samples" />
      <item ref="omw" />
      <item ref="wordnet" />
      <item ref="wordnet_ic" />
      <item ref="words" />
      <item ref="maxent_ne_chunker" />
      <item ref="punkt" />
      <item ref="snowball_data" />
      <item ref="averaged_perceptron_tagger" />
    </collection>

EDITED

In case anyone is avoiding errors from downloading larger datasets from nltk, from https://stackoverflow.com/a/38135306/610569

$ rm /Users/<your_username>/nltk_data/corpora/panlex_lite.zip
$ rm -r /Users/<your_username>/nltk_data/corpora/panlex_lite
$ python

>>> import nltk
>>> dler = nltk.downloader.Downloader()
>>> dler._update_index()
>>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed.
>>> dler.download('popular')

Updated

From v3.2.5, NLTK has a more informative error message when nltk_data resource is not found, e.g.:

>>> from nltk import word_tokenize
>>> word_tokenize('x')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/l/alvas/git/nltk/nltk/tokenize/__init__.py", line 128, in word_tokenize
    sentences = [text] if preserve_line else sent_tokenize(text, language)
  File "/Users//alvas/git/nltk/nltk/tokenize/__init__.py", line 94, in sent_tokenize
    tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
  File "/Users/alvas/git/nltk/nltk/data.py", line 820, in load
    opened_resource = _open(resource_url)
  File "/Users/alvas/git/nltk/nltk/data.py", line 938, in _open
    return find(path_, path + ['']).open()
  File "/Users/alvas/git/nltk/nltk/data.py", line 659, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')

  Searched in:
    - '/Users/alvas/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

To find nltk_data directory (auto-magically), see https://stackoverflow.com/a/36383314/610569
To download nltk_data to a different path, see https://stackoverflow.com/a/48634212/610569
To config nltk_data path (i.e. set a different path for NLTK to find nltk_data), see https://stackoverflow.com/a/22987374/610569

To install nltk_data in a conda environment, see https://stackoverflow.com/a/53464117/501086 — Shatu, Nov 25 '18 at 02:18

score 35 · Answer 2 · edited Jan 22 '19 at 10:10

35

Try

nltk.download('all')

this will download all the data and no need to download individually.

edited Jan 22 '19 at 10:10

Noordeen

1,547
20
26

answered Nov 09 '17 at 10:00

B K

723
8
17

in my case it was not loading UI.. do not know why... but this helped me. thanks. – desaiankitb Mar 27 '18 at 17:36
6

FYI - as of 2019/12/15, that whole folder is about 3.2 GB, including the zip files. – Florin Andrei Dec 15 '19 at 23:21
Thanks a lot @FlorinAndrei for that info. – Deepam Gupta Dec 18 '20 at 14:43

Noordeen · Answer 3 · 2019-01-22T07:57:47.530

16

Install Pip: run in terminal : sudo easy_install pip

Install Numpy (optional): run : sudo pip install -U numpy

Install NLTK: run : sudo pip install -U nltk

Test installation: run: python

then type : import nltk

To download the corpus

run : python -m nltk.downloader all

edited Jan 22 '19 at 07:57

answered May 03 '18 at 14:52

Noordeen

1,547
20
26

score 13 · Answer 4 · answered Jun 02 '16 at 06:41

13

Do not name your file nltk.py I used the same code and name it nltk, and got the same error as you have, I changed the file name and it went well.

answered Jun 02 '16 at 06:41

Gerard

518
4
19

score 13 · Answer 5 · answered May 23 '18 at 13:24

13

This worked for me:

nltk.set_proxy('http://user:password@proxy.example.com:8080')
nltk.download()

answered May 23 '18 at 13:24

Morteza Mashayekhi

934
11
23

score 10 · Answer 6 · edited Oct 17 '18 at 09:46

Please Try

import nltk

nltk.download()

After running this you get something like this

NLTK Downloader
---------------------------------------------------------------------------
   d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------

Then, Press d

Do As Follows:

Downloader> d all

You will get following message on completion, and Prompt then Press q Done downloading collection all

score 5 · Answer 7 · edited Feb 06 '18 at 03:58

5

you can't have a saved python file called nltk.py because the interpreter is reading from that and not from the actual file.

Change the name of your file that the python shell is reading from and try what you were doing originally:

import nltk and then nltk.download()

edited Feb 06 '18 at 03:58

mrsrinivas

34,112
13
125
125

answered Jun 13 '14 at 18:23

user3682157

1,625
8
29
55

score 5 · Answer 8 · edited Jan 24 '19 at 06:00

5

It's very simple....

Open pyScripter or any editor
Create a python file eg: install.py
write the below code in it.

import nltk
nltk.download()

A pop-up window will apper and click on download .

The download window]

edited Jan 24 '19 at 06:00

layog

4,661
1
28
30

answered Oct 10 '17 at 14:33

Arun Das

63
1
4

The pop up window is not opening . I have tried many times . The version of the nltk is also new one which is 3.4.1 . Now tell what should be issue ? – Hamza Tahir May 31 '19 at 11:24
@HamzaTahir Same happened with me, I restarted my kernel – Lakhani Aliraza Apr 03 '20 at 04:56

score 4 · Answer 9 · edited Feb 06 '18 at 01:52

4

If you are running a really old version of nltk, then there is indeed no download module available (reference)

Try this:

import nltk
print(nltk.__version__)

As per the reference, anything after 0.9.5 should be fine

edited Feb 06 '18 at 01:52

answered Mar 05 '14 at 23:33

Miquel

15,405
8
54
87

2

This is actually what I suspected, which is why I suggested the OP uses pip to install NLTK instead. – Michael Aquilina Mar 05 '14 at 23:41
It won't even print out the version info. The version is 2.0.4. That is the link I followed from the book **Natural Language Processing with Python**. Here is the [link]http://www.nltk.org/install.html. – Q-ximi Mar 05 '14 at 23:42
Ok, that's the latest. I guess this isn't it then. Let's leave it here for future reference – Miquel Mar 05 '14 at 23:45
Does any of nltk work for you? try running: `nltk.word_tokenize("hello world")` and see if gives you any output – Michael Aquilina Mar 06 '14 at 09:25

score 4 · Answer 10 · edited Feb 03 '16 at 15:30

4

I had the similar issue. Probably check if you are using proxy.

If yes, set up the proxy before doing download:

nltk.set_proxy('http://proxy.example.com:3128', ('USERNAME', 'PASSWORD'))

edited Feb 03 '16 at 15:30

Undo

25,519
37
106
129

answered Feb 03 '16 at 15:18

victor_gu

249
1
3
9

score 2 · Answer 11 · edited Feb 05 '16 at 12:39

2

you should add python to your PATH during installation of python...after installation.. open cmd prompt type command-pip install nltk then go to IDLE and open a new file..save it as file.py..then open file.py type the following: import nltk

nltk.download()

edited Feb 05 '16 at 12:39

Jaffer Wilson

7,029
10
62
139

answered Feb 05 '16 at 11:56

ADITYA AISHWARY

31
1
3

score 2 · Answer 12 · answered Jan 02 '19 at 21:26

2

Try download the zip files from http://www.nltk.org/nltk_data/ and then unzip, save in your Python folder, such as C:\ProgramData\Anaconda3\nltk_data

answered Jan 02 '19 at 21:26

Jenny

21
1

This is the method I had to use (in modified form) for a Linux server that is not connected to the internet. I unzipped it under a directory I created, /usr/share/nltk_data/tokenizers/ – Mike Maxwell Aug 05 '21 at 21:13

score 1 · Answer 13 · answered Jul 01 '19 at 10:17

1

if you have already saved a file name nltk.py and again rename as my_nltk_script.py. check whether you have still the file nltk.py existing. If yes, then delete them and run the file my_nltk.scripts.py it should work!

answered Jul 01 '19 at 10:17

Manasa

11
1

tikendraw · Answer 14 · 2021-12-14T09:44:17.150

just do like

import nltk
nltk.download()

then you will be show a popup asking what to download , select 'all'. it will take some time because of its size, but eventually we will get it.

and if you are using Google Colab, you can use

nltk.download(download_dir='/content/nltkdata')

after running that you will be asked to select from a list

NLTK Downloader
----------------------------------------------------------------- 
----------
d) Download   l) List    u) Update   c) Config   h) Help   q) 
Quit
----------------------------------------------------------------- 
----------
Downloader> d

here you have to enter d as you want to download. after that you will be asked to enter the identifier that you want to download . You can see the list of available indentifier with l command or if you want all of them just enter 'all' in the input box. then you will see something like -

Downloading collection 'all'
       | 
       | Downloading package abc to /content/nltkdata...
       |   Unzipping corpora/abc.zip.
       | Downloading package alpino to /content/nltkdata...
       |   Unzipping corpora/alpino.zip.
       | Downloading package biocreative_ppi to /content/nltkdata...
       |   Unzipping corpora/biocreative_ppi.zip.
       | Downloading package brown to /content/nltkdata...
       |   Unzipping corpora/brown.zip.
       | Downloading package brown_tei to /content/nltkdata...
       |   Unzipping corpora/brown_tei.zip.
       | Downloading package cess_cat to /content/nltkdata...
       |   Unzipping corpora/cess_cat.zip.
.
.
. 
 |   Unzipping models/wmt15_eval.zip.
       | Downloading package mwa_ppdb to /content/nltkdata...
       |   Unzipping misc/mwa_ppdb.zip.
       | 
     Done downloading collection all

---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------
Downloader> q
True

at last you can enter q to quit.

score -4 · Answer 15 · edited Sep 28 '17 at 21:00

-4

You may try:

>> $ import nltk
>> $ nltk.download_shell()
>> $ d
>> $ *name of the package*

happy nlp'ing.

edited Sep 28 '17 at 21:00

CDspace

2,639
18
30
36

answered Sep 28 '17 at 20:12

Henrique Brandão

1
1

How do I download NLTK data?

15 Answers15

TL;DR

EDITED

Updated

Related

Linked

Related