0

I have successfully installed a package called pronounceable for use in a Lambda function. However it seems that the NLTK corpora cmudict is a dependpency which cannot be located. This means that use of the command import pronounceable results in the following error.

>  Resource [93mcmudict[0m not found.   Please use the NLTK Downloader
> to obtain the resource:
> 
>   [31m>>> import nltk
>   >>> nltk.download('cmudict')   [0m   For more information see: https://www.nltk.org/data.html
> 
>   Attempted to load [93mcorpora/cmudict[0m
> 
>   Searched in:
>     - './nltk_data'
>     - '/home/sbx_user1051/nltk_data'
>     - '/var/lang/nltk_data'
>     - '/var/lang/share/nltk_data'
>     - '/var/lang/lib/nltk_data'
>     - '/usr/share/nltk_data'
>     - '/usr/local/share/nltk_data'
>     - '/usr/lib/nltk_data'
>     - '/usr/local/lib/nltk_data'

WHAT I HAVE TRIED

1. Here is the code I used to install the package by way of creating a layer to apply to my Lambda function.

# STEP 1 
mkdir folder
cd folder
virtualenv v-env
source ./v-env/bin/activate
pip install pronounceable
deactivate 


# STEP 2
mkdir python
cd python
cp -r ../v-env/lib64/python3.6/dist-packages/* .
cd ..
zip -r pronounceable_layer.zip python
aws lambda publish-layer-version --layer-name pronounceable --zip-file fileb://panda_layer.zip --compatible-runtimes python3.6

I then simply selected and added the resultant layer to the Lambda function. Then, as per this suggestion, I placed the contents of cmudict (which I had manually downloaded to my local machine) into text file, within a folder called nltk_data, within the Lambda root folder. I also attempted to alleviate the issue by adding an environment variable with the key/value NLTK_DATA & ./nltk_data, and added nltk.download('cmudict', download_dir="/var/task/nltk_data") at the top of the function, to no avail.

2. I also used Cloud9 to open the NLTK file data.py and amend the path as per this suggestion, due to a suspicion that nltk.data.path.append() was not working.

3. I also manually set the download path to nltk.download('cmudict', download_dir='/tmp/') as per this suggestion, but this does not appear to work either.

I am at a loss as to what I need to do next.

QUESTION

What do I need to do to ensure that cmudict is available for use by nltk in my Lambda function?

jimiclapton
  • 775
  • 3
  • 14
  • 42

1 Answers1

0

RESOLVED

Posting answer in case it helps anyone experiencing a similar issue.

I resolved this issue by taking another look at the error message which suggests that the corpus file cmudict could not be found. The full, expected path of this file is as follows:

/var/task/nltk_data/corpora/cmudict/cmudict

That is to say, the file cmudict needs to be placed in a folder called cmudict, which needs to be placed inside corpora, which needs to be placed inside nltk_data.

This can be achieved by creating the path in either of the following wasy:

  1. Manually in the Lambda console (right click to create folder/file and paste corpora contents into the editor)

  2. By creating the file structure nltk_data/corpora/cmudict/cmudict on a local machine, zipping the files and uploading the zip file to the Lambda editor.

enter image description here

enter image description here

NOTE - You may also need to amend the lambda code to reflect the expected path to the corpora, as follows:

import nltk
from nltk.corpus import cmudict
nltk.data.path.append("/var/task/nltk_data")

You may also wish to set an environment variable and amend the file data.py as described in the answers linked to above.

jimiclapton
  • 775
  • 3
  • 14
  • 42
  • Hello, my question is not related to problem for which this thread was created but something I noticed. I have seen the mention of user sbx_user1051 wherever lambda is involved and I tried to find who is this 'sbx_user1051' user but no luck. Any idea? it appears a user within docker container but not sure. I could not find information about this user itself but it is mentioned allover. – srinathbharadwaj Feb 02 '21 at 21:15
  • Sorry, I am not familiar with your issue. All the best in finding a resolution. – jimiclapton Feb 02 '21 at 21:47
  • How did you know `/var/task/nltk_data/corpora/cmudict/cmudict` is the *expected* path? – KJ Ang Jun 28 '23 at 05:30
  • Not sure I understand your question. `/var/task` is the lambda function folder (https://www.cloudtechsimplified.com/aws-lambda-layers/) so any dependencies that are placed in here need to be referenced using the appropriate path relative to your folder structure. – jimiclapton Jun 28 '23 at 09:51