1

For a particular script I'm running, I need to have installed from nltk the following packages:

req_modules = ['punkt', 'stopwords', 'averaged_perceptron_tagger', 'maxent_ne_chunker']

I know I can check whether stopwords is downloaded, like this:

import nltk
import os

if 'stopwords' in os.listdir(nltk.data.find('corpora')):
    print(True)
else:
    print(False)

For me, since I've used stopwords before, this works. However, I want to be able to programmatically check if the other three modules are installed, eventually using something like:

if not all(m in os.listdir(nltk.data.find('models')) for m in ['punkt', 'averaged_perceptron_tagger', 'maxent_ne_chunker']:
    # download the ones that aren't already downloaded

They are all labeled as modules in the downloader accessed at nltk.download(). This should be an easy lookup, so I tried something like this to get all downloaded subpackages in one list:

all_downloaded = os.listdir(nltk.data.find("corpora")) + os.listdir(nltk.data.find("models"))

But I get the LookupError: Resource 'models' not found. How can I search the 'models' tab in nltk.data just like I can search 'corpora'? I assume the naming conventions for finding these resources is the same, as "corpora" is the same name of the tab seen in the downloader below

enter image description here

Edit:

Taking into account the suggestion below, I tried the code below, but still get an ImportError, even though I have exception-handling. What is going on there?

req_modules = {'from nltk import punkt': 'punkt', 'from nltk.corpus import stopwords': 'stopwords',
               'from nltk import pos_tag': 'averaged_perceptron_tagger',
               'from nltk import ne_chunk': 'maxent_ne_chunker',
               'from nltk.stem.porter import PorterStemmer': 'porter_test'}

for m in req_modules:
    try:
        print("Trying: %s" % m)
        exec(m)
    except LookupError or ImportError:
        print("Tried: %s. Resource '%s' was not available and is being downloaded.\n" % (m, req_modules[m]))
        nltk.download(req_modules[m])

Edit 2:

I got it to work, nevermind. I changed from nltk import porter_test to from nltk.stem.porter import PorterStemmer and things work smoothly!

blacksite
  • 12,086
  • 10
  • 64
  • 109

2 Answers2

1

Looks like you are confusing nltk modules with the files in the nltk_data directory, which the modules use. When you install the nltk, you get all the packages. Various modules and functions require data files which you fetch into nltk_data with the downloader. (Some of them are in the category "Models", which maybe you confuse with "modules"?) To figure out which data file to check for, you could run the corresponding function without an nltk_data folder and inspect the error message. For example:

>>> nltk.ne_chunk("anything")
Traceback (most recent call last):
...
raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource
  'chunkers/maxent_ne_chunker/PY3/english_ace_multiclass.pickle'   
  not found.  Please use the NLTK Downloader to obtain the 
  ...

But if it were me, I would not mess with the data files directly. Instead, just try out the service you want and see if it raises an error:

 try:
     nltk.ne_chunk([])
 except LookupError:
     nltk.download("maxent_ne_chunker")
alexis
  • 48,685
  • 16
  • 101
  • 161
  • So, searching for the presence of downloaded models in the "Models" portion of `nltk` doesn't work the same way as searching for something like `if 'stopwords' in os.listdir(nltk.data.find("corpora"))`? – blacksite Nov 03 '16 at 13:40
  • "Models" is simply a menu tab in the downloader. It has no relationship to the `nltk_data` folder hierarchy, just like there is no `book` folder there. And you must have meant `nltk_data`, not `nltk`. – alexis Nov 03 '16 at 13:45
  • Incidentally, when you do check for the existence of a file don't do it by fetching an entire `os.listdir()`. See http://stackoverflow.com/questions/82831/how-do-i-check-whether-a-file-exists-using-python. – alexis Nov 05 '16 at 23:14
0

I have got the same error.

nltk.download("maxent_ne_chunker") 

is giving me the zip file in my /Users/../nltk_data folder i extracted the zip it is working fine.