For a particular script I'm running, I need to have installed from nltk
the following packages:
req_modules = ['punkt', 'stopwords', 'averaged_perceptron_tagger', 'maxent_ne_chunker']
I know I can check whether stopwords
is downloaded, like this:
import nltk
import os
if 'stopwords' in os.listdir(nltk.data.find('corpora')):
print(True)
else:
print(False)
For me, since I've used stopwords
before, this works. However, I want to be able to programmatically check if the other three modules are installed, eventually using something like:
if not all(m in os.listdir(nltk.data.find('models')) for m in ['punkt', 'averaged_perceptron_tagger', 'maxent_ne_chunker']:
# download the ones that aren't already downloaded
They are all labeled as modules in the downloader accessed at nltk.download()
. This should be an easy lookup, so I tried something like this to get all downloaded subpackages in one list:
all_downloaded = os.listdir(nltk.data.find("corpora")) + os.listdir(nltk.data.find("models"))
But I get the LookupError: Resource 'models' not found
. How can I search the 'models'
tab in nltk.data
just like I can search 'corpora'
? I assume the naming conventions for finding these resources is the same, as "corpora" is the same name of the tab seen in the downloader below
Edit:
Taking into account the suggestion below, I tried the code below, but still get an ImportError
, even though I have exception-handling. What is going on there?
req_modules = {'from nltk import punkt': 'punkt', 'from nltk.corpus import stopwords': 'stopwords',
'from nltk import pos_tag': 'averaged_perceptron_tagger',
'from nltk import ne_chunk': 'maxent_ne_chunker',
'from nltk.stem.porter import PorterStemmer': 'porter_test'}
for m in req_modules:
try:
print("Trying: %s" % m)
exec(m)
except LookupError or ImportError:
print("Tried: %s. Resource '%s' was not available and is being downloaded.\n" % (m, req_modules[m]))
nltk.download(req_modules[m])
Edit 2:
I got it to work, nevermind. I changed from nltk import porter_test
to from nltk.stem.porter import PorterStemmer
and things work smoothly!