-3

Let's say I've pulled the NVIDIA NGC PyTorch docker image like this: docker pull nvcr.io/nvidia/pytorch:21.07-py3

Then I want to add these python packages: omegaconf wandb pycocotools?

How do I create a new Docker image with both the original Docker image and the additional Python packages?

Also, how do I distribute the new image throughout my organization?

user550701
  • 142
  • 1
  • 10
  • Simply write a Dockerfile that starts from the NVIDIA image and installs the additional packages. – Marcello Romani Aug 27 '21 at 01:57
  • @MarcelloRomani OK, how? – user550701 Aug 27 '21 at 02:17
  • Sorry but this is homework :-) it's straightforward once you know how to write a basic Dockerfile and install a couple python packages. – Marcello Romani Aug 27 '21 at 12:47
  • There is a partial answer here: https://stackoverflow.com/questions/58191215/how-to-add-python-libraries-to-docker-image – user550701 Aug 27 '21 at 20:07
  • I don't understand your gripe about no docs and no examples. Everything you need is here: https://docs.docker.com/get-started/ – SiKing Aug 27 '21 at 20:46
  • I didn't say that. I wrote that the documentation is unclear and the examples incomplete. I had the same experience with the Git documentation, but learned to use it effectively thanks to helpful answers to beginner questions, like this one: https://stackoverflow.com/questions/927358/how-do-i-undo-the-most-recent-local-commits-in-git – user550701 Aug 27 '21 at 22:37
  • @MarcelloRomani No, it is not "homework." It is a common task that all new docker users will want to know how to do at some point. I am sure it is easy when you know how, most things are. – user550701 Aug 27 '21 at 22:48
  • 1
    One of the reasons for downvoting a question reads: "This question doesn't show any research effort". – Marcello Romani Aug 29 '21 at 12:21
  • Here's how you ask a basic question the right way: https://stackoverflow.com/questions/26734402/how-to-upgrade-docker-container-after-its-image-changed?rq=1 – Marcello Romani Aug 29 '21 at 12:22
  • The example question you linked to actually shows no evidence that the asker tried anything at all to solve his own problem. – user550701 Aug 30 '21 at 00:30
  • 1
    @MarcelloRomani The "doesn't show any research effort" criticism no longer makes any sense because I have now answered my own question. That proves that I did sufficient research, no? – user550701 Aug 30 '21 at 01:19
  • "no longer makes any sense" I would rephrase as "no long applies" :-P Well done :) – Marcello Romani Aug 30 '21 at 12:01

1 Answers1

3

Create a file named Dockerfile. Add to it the lines explained below.

Add a FROM line to specify the base image:

FROM nvcr.io/nvidia/pytorch:21.07-py3

Upgrade Pip to the latest version:

RUN python -m pip install --upgrade pip

Install the additional Python packages that you need:

RUN python -m pip install omegaconf wandb pycocotools

Altogether, the Dockerfile looks like this:

FROM nvcr.io/nvidia/pytorch:21.07-py3
RUN python -m pip install --upgrade pip
RUN python -m pip install omegaconf wandb pycocotools

In the same directory as the Dockerfile, run this command to build the new image, replacing my-new-image with a name of your choosing:

docker build -t my-new-image .

This works for me, but Pip generates a warning about installing packages as the root user. I found it best to ignore this warning. See the note at the end of this answer to understand why.

The new docker image should now appear on your system:

$ docker images
REPOSITORY                         TAG                            IMAGE ID       CREATED              SIZE
my-new-image                       latest                         082f76972805   13 seconds ago   15.1GB
nvcr.io/nvidia/pytorch             21.07-py3                      7beec3ff8d35   5 weeks ago          15GB
[...]

You can now run the new image ..

$ docker run --gpus all -it --rm --ipc=host my-new-image

.. and verify that it has the additional Python packages:

# python -m pip list | grep 'omegaconf\|wandb\|pycocotools'
omegaconf                     2.1.1
pycocotools                   2.0+nv0.5.1
wandb                         0.12.1

The Docker Hub Repositories documentation details the steps necessary to:

  1. Create a repository (possibly private)
  2. Push an image
  3. Add collaborators
  4. Pull the image from the respository

NOTE: The problem of non-root users: Although it is considered "best practices" not to run a Docker container as the root Docker user, in practice non-root users can add several complications.

You could create a non-root user in your docker file with lines like this:

RUN useradd -ms /bin/bash myuser
USER myuser
ENV PATH "$PATH:/home/myuser/.local/bin"

However, if you run the container with mounted volumes using the -v flag, then myuser will be conferred access to those volumes based on whether their userid or groupid matches a user or group in the host system. You can modify the useradd commandline to specify the desired userid or groupid, but of course the resulting image will not be portable to systems that have different ids.

Additionally, there appears to be a limitation that prevents a non-root user from accessing a mounted volume that points to an fscrypt encrypted folder. However, this works fine for me with the root docker user.

For these reasons, I found it easiest to just let the container run as root.

user550701
  • 142
  • 1
  • 10
  • I have never seen an example Dcokerfile that uses `python -m pip` to add Python packages, nevertheless I used it here because I learned that was the proper way to invoke Pip. When using the `pip` command directly, there is a chance that `pip` and `python` actual point to different versions of Python. This possibility is avoided if we use `python -m pip` instead. – user550701 Aug 29 '21 at 20:40
  • One thing I'm looking into, out of curiosity, is whether it's possible to install those Python packages using the base image's package manager. That should get rid of the "don't use pip as root" warning. – Marcello Romani Aug 30 '21 at 12:04
  • Update: no, they don't seem to be available as APT packages... – Marcello Romani Aug 30 '21 at 13:19
  • You can verify the relationship between pip/python interpreter with `pip --version pip 21.1.3 from /opt/conda/lib/python3.8/site-packages/pip (python 3.8)` – Marcello Romani Aug 30 '21 at 13:20
  • 1
    This recent answer adding a new user in the Dockerfile and switching to that user before running Pip. Also recommends upgrading Pip first: https://stackoverflow.com/questions/68673221/warning-running-pip-as-the-root-user – user550701 Aug 30 '21 at 16:10
  • 1
    Apparently docker best practices is not to use the root account in the docker container. https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#user https://sysdig.com/blog/dockerfile-best-practices/ – user550701 Aug 30 '21 at 23:58
  • One challenge with using a non-root user in a docker container is that a regular user will by default not have access to any of the mounted volumes in the container. How do I give `myuser` access to a mounted volume? – user550701 Sep 01 '21 at 22:39
  • We want a way to mount volumes as a non-root docker user, but this functionality does not exist. See lengthy discussion, including workarounds, here: https://github.com/moby/moby/issues/2259 – user550701 Sep 01 '21 at 22:55
  • 1
    Non-root docker users are conferred file access permissions to mounted volumes based on their userid or groupid matching a host system userid or groupid. But docker images are non-portable if they assume the host system has a user or group with a specific id. Also, this doesn't seem to work at all for `fscrypt` filesystems. Only the root docker user seems to be able to access files in an unlocked `fscrypt` filesystem. – user550701 Sep 02 '21 at 00:24