1

I am new to docker, and I am trying to install some packages of nltk on docker Here is my docker file

FROM python:3-onbuild

RUN python -m libs.py

COPY start.sh /libs.py

COPY start.sh /start.sh

EXPOSE 8000

CMD ["/start.sh"]

Here is My libs.py which contain the packages of nltk to download

import nltk
nltk.data.path.append('./')
nltk.download('wordnet')
nltk.download('pros_cons')
nltk.download('snowball_data')
nltk.download('averaged_perceptron_tagger')
nltk.download('averaged_perceptron_tagger_ru')
nltk.download('punkt')
nltk.download('universal_tagset')
nltk.download('maxent_treebank_pos_tagger')
nltk.download('hmm_treebank_pos_tagger')
nltk.download('reuters')
nltk.download('treebank')
nltk.download('vader_lexicon')
nltk.download('porter_test')
nltk.download('rslp')

Docker Image created successfully but when I try to use these packages it throwing me error

LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')

  Searched in:
    - '/root/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/usr/local/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

Can anybody tell why the nltk packages not installed? thanks

Nazir Ahmed
  • 615
  • 4
  • 14
  • 29

2 Answers2

1

It looks like you have to create a user inside Docker. You should try to avoid being root in Docker (by default).

Nevertheless you can set the download_dir when using nltk.download():

download(self, info_or_id=None, download_dir=None, quiet=False, force=False, prefix='[nltk_data] ', halt_on_error=True, raise_on_error=False):

And if no value is set for download_dir, it will try to save it the default path:

    # decide where we're going to save things to.
    if self._download_dir is None:
        self._download_dir = self.default_download_dir()

More specifically: https://github.com/nltk/nltk/blob/develop/nltk/downloader.py#L919

def default_download_dir(self):
    """
    Return the directory to which packages will be downloaded by
    default.  This value can be overridden using the constructor,
    or on a case-by-case basis using the ``download_dir`` argument when
    calling ``download()``.
    On Windows, the default download directory is
    ``PYTHONHOME/lib/nltk``, where *PYTHONHOME* is the
    directory containing Python, e.g. ``C:\\Python25``.
    On all other platforms, the default directory is the first of
    the following which exists or which can be created with write
    permission: ``/usr/share/nltk_data``, ``/usr/local/share/nltk_data``,
    ``/usr/lib/nltk_data``, ``/usr/local/lib/nltk_data``, ``~/nltk_data``.
    """

Thus it's saving the file at /root/nltk_data/

It looks like you're accessing / directory when you run CMD ["/start.sh"] the docker image, so perhaps you have some permission settings with /root/nltk_data.

In short

Explicitly set the path where you want the nltk_data directory to be downloaded:

nltk.download('popular', download_dir='/path/to/nltk_data/')

When running a new python instance,

nltk.data.path.append('/path/to/nltk_data/')

See also: How to config nltk data directory from code?

alvas
  • 115,346
  • 109
  • 446
  • 738
  • I add the download_dir='./libs_data/nltk_data/' but it giving me this error while unzipping the package files `The command '/bin/sh -c python -m libs.py' returned a non-zero code: 1` – Nazir Ahmed Nov 01 '17 at 07:26
  • Docker doesn't like dynamic paths, give a static path. Also, it's advisable to create a user in docker, not use root. – alvas Nov 01 '17 at 08:07
  • by adding `nltk.data.path.append('/path/to/nltk')` into settings.py of django app it works for me. – Nazir Ahmed Nov 01 '17 at 11:15
0

You have to set your nltk.data.path.append('/path/to/nltk_data') in your settings.py file and the the procedure is same

libs.py contain all the packges detail

After that add this into your docker file

RUN pip install nltk

RUN python nltk_pkg.py

COPY start.sh /nltk_pkg.py

COPY start.sh /start.sh

It works for me.

Nazir Ahmed
  • 615
  • 4
  • 14
  • 29