2

I am trying to launch a django service using docker which uses nltk library. In the dockerfile I have called a setup.py which calls nltk.download. According to the logs I see during building the docker image this step runs successfully.

But when I run the docker image and try to connect to my django service, I get the error saying that nltk.download hasn't happened yet.

Dockerfile code -

RUN . ${PYTHON_VIRTUAL_ENV_FOLDER}/bin/activate && python ${PYTHON_APP_FOLDER}/setup.py

setup.py code -

import nltk
import os

nltk.download('stopwords', download_dir=os.getcwd() + '/nltk_data/')
nltk.download('wordnet', download_dir=os.getcwd() + '/nltk_data/')

Error:

**********************************************************************
  Resource stopwords not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('stopwords')

  Searched in:
    - '/root/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/usr/src/venv/nltk_data'
    - '/usr/src/venv/share/nltk_data'
    - '/usr/src/venv/lib/nltk_data'
**********************************************************************

Any idea what is wrong here? Also, the same code works when I run it without docker.

yashdosi
  • 1,186
  • 1
  • 18
  • 40
  • In what way doesn't it work? Can you add the actual error message, the Dockerfile, and your `docker run` command or `docker-compose.yml` to the question? – David Maze Sep 27 '18 at 11:32
  • `docker run -it -e ENVIRONMENT_NAME=local -e REGION_NAME=local -p 9081:8080 docker_image` - The run command – yashdosi Sep 27 '18 at 11:35
  • @DavidMaze - I have already added the relevant line from Dockerfile in the question. Also, added the error message and docker-run command. – yashdosi Sep 27 '18 at 11:36
  • Mount the `nltk_data`, `docker build -f Dockerfile -v $HOME/nltk_data/:/nltk_data/`. Then before `setup.py`, in Dockerfile, `ENV NLTK_DATA=/nltk_data/` – alvas Sep 27 '18 at 16:51

2 Answers2

3

Having faced that same problem before and having done almost the same thing you did, I'd assume what you're missing here is configuring the nltk.data.path by adding to the path wherever your os.getcwd() is.

henriquesalvaro
  • 1,232
  • 1
  • 8
  • 14
  • 1
    This worked thanks! but with a slight tweak. I had to specify this env variable in Dockerfile and then also use an add statement. `ENV NLTK_DATA /app/nltk_data/` `ADD . $NLTK_DATA` – yashdosi Oct 01 '18 at 06:44
0

Thanks for the post and it fixed my issue as well!!!!

I got the same issue that punkt does exit in docker:

/root/nltk_data/tokenizers/punkt

But when my app tried to reach it, Docker kept complaining the resource couldn't be found.

Inspired by your post, I added:

ENV NLTK_DATA /root/nltk_data/
ADD . $NLTK_DATA

But still got the same error message. So I tried this:

ENV NLTK_DATA /nltk_data/
ADD . $NLTK_DATA

I didn't know why I wanted to remove /root from the path but it worked!

My app is using Flask and uWSGI, so I guess maybe this is an issue for Django and Flask? Thanks anyway!

Howard
  • 23
  • 4