3

Using NLTK 2.0.4. installed for EPD's Python-2.7.3 (not Canopy). on Ubuntu 12.10. In the terminal I type:

In [96]: nltk.download_shell()
NLTK Downloader
---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------
Downloader> d

Download which package (l=list; x=cancel)?
  Identifier> punkt
    Downloading package 'punkt' to /home/espears/nltk_data...

And then it freezes. The relevant punkt.zip file is written to the stated directory, but the download interface never relinquishes.

This example is with IPython, but I tried the same with the regular Python 2.7.3 interpreter and got the same result.

When I try to use unzip to unzip the file directly, I see errors saying that the proper central zip-file code is not found within the file and that it cannot be unzipped. See below:

espears@computer ~/nltk_data/tokenizers $ unzip punkt.zip 
Archive:  punkt.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of punkt.zip or
        punkt.zip.zip, and cannot find punkt.zip.ZIP, period.

This happens with both nltk.download() and nltk.download_shell() in the same way.

I can inspect the .zip file using du to see that initially its size grows from 0 MB to about 2.7 MB, so it is actually downloading something and the file is not empty. But it stops at 2.7 MB (which may or may not correspond to the expected full size of the file) and then the Python shell downloader freezes.

sophros
  • 14,672
  • 11
  • 46
  • 75
ely
  • 74,674
  • 34
  • 147
  • 228
  • Possibly this problem? https://support.enthought.com/entries/25801945-NLTK-Natural-Language-Toolkit-download-function-hangs – BrenBarn Jan 17 '14 at 20:08
  • No, I am not using Canopy. This is an older distribution from Enthought. I am also using it via IPython, but can confirm that the same hanging happens if used directly from the Python terminal. Note that I experience the same issue even when I use `download_shell` which bypasses the graphics concerns. – ely Jan 17 '14 at 20:09

3 Answers3

3

I had the same problem and downloaded the necessary items manually from the following link:

http://nltk.org/nltk_data/

Not the desired solution, but will work until this is fixed.

UPDATE:

I was actually able to run nltk.download() to install cmudict. Maybe this issue only affects certain packages?

emispowder
  • 1,787
  • 19
  • 21
1

I had the same problem with nltk 3.0.01b. I downloaded the "book" package and monitored the download from the task manager's network display while at the same time checking the size of the target folder (AppData\Roaming\nltk_data on my Windows 7 system). The network traffic ceased and the folder stopped growing at a size of 379 MB. But the Python shell was locked. The following was the last message displayed:

showing info http://nltk.github.com/nltk_data/

However, if you cancel out the Tk window that shows what download items are available, the nltk.download() command will terminate and the shell prompt will come back.

Sabuncu
  • 5,095
  • 5
  • 55
  • 89
0

Most probably it is not stuck. It may be downloading. It downloads at much slower rate even if you have good internet connectivity. I kept checking the folder size using a while loop and it slowly kept on increasing and it was successful finally. It would have worked if you waited. Unzipping might have failed because you tried to unzip before entire file downloaded.

Raghuram Vadapalli
  • 1,190
  • 2
  • 13
  • 27