14

I am trying to build a Docker container on a server within which a conda environment is built. All the other requirements are satisfied except for CUDA enabled PyTorch (I can get PyTorch working without CUDA however, no problem). How do I make sure PyTorch is using CUDA?

This is the Dockerfile :

# Use nvidia/cuda image
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04

# set bash as current shell
RUN chsh -s /bin/bash

# install anaconda
RUN apt-get update
RUN apt-get install -y wget bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 git mercurial subversion && \
        apt-get clean
RUN wget --quiet https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh -O ~/anaconda.sh && \
        /bin/bash ~/anaconda.sh -b -p /opt/conda && \
        rm ~/anaconda.sh && \
        ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
        echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
        find /opt/conda/ -follow -type f -name '*.a' -delete && \
        find /opt/conda/ -follow -type f -name '*.js.map' -delete && \
        /opt/conda/bin/conda clean -afy

# set path to conda
ENV PATH /opt/conda/bin:$PATH


# setup conda virtual environment
COPY ./requirements.yaml /tmp/requirements.yaml
RUN conda update conda \
    && conda env create --name camera-seg -f /tmp/requirements.yaml \
    && conda install -y -c conda-forge -n camera-seg flake8

# From the pythonspeed tutorial; Make RUN commands use the new environment
SHELL ["conda", "run", "-n", "camera-seg", "/bin/bash", "-c"]

# PyTorch with CUDA 10.2
RUN conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch

RUN echo "conda activate camera-seg" > ~/.bashrc
ENV PATH /opt/conda/envs/camera-seg/bin:$PATH

This gives me the following error when I try to build this container ( docker build -t camera-seg . ):

.....

Step 10/12 : RUN conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
 ---> Running in e0dd3e648f7b
ERROR conda.cli.main_run:execute(34): Subprocess for 'conda run ['/bin/bash', '-c', 'conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch']' command failed.  (See above for error)

CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run

    $ conda init <SHELL_NAME>

Currently supported shells are:
  - bash
  - fish
  - tcsh
  - xonsh
  - zsh
  - powershell

See 'conda init --help' for more information and options.

IMPORTANT: You may need to close and restart your shell after running 'conda init'.



The command 'conda run -n camera-seg /bin/bash -c conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch' returned a non-zero code: 1

This is the requirements.yaml:

name: camera-seg
channels:
  - defaults
  - conda-forge
dependencies:
  - python=3.6
  - numpy
  - pillow
  - yaml
  - pyyaml
  - matplotlib
  - jupyter
  - notebook
  - tensorboardx
  - tensorboard
  - protobuf
  - tqdm

When I put pytorch, torchvision and cudatoolkit=10.2 within the requirements.yaml, then PyTorch is successfully installed but it cannot recognize CUDA ( torch.cuda.is_available() returns False ).

I have tried various solutions, for example, this, this and this and some different combinations of them but all to no avail.

Any help is much appreciated. Thanks.

Davide Fiocco
  • 5,350
  • 5
  • 35
  • 72
Rahul Bohare
  • 762
  • 2
  • 11
  • 31
  • I have the same issue. Any ideas on how to solve this *without* installing torch with pip? – rubencart Apr 12 '21 at 10:24
  • @rubencart You can install all the dependencies using conda within the requirements file and for torch and torchvision, make an entry within Dockerfile explicitly. For example, add this line in Dockerfile after the line installing the `requirements.yaml` dependencies: `RUN conda install -y -n ${CONDA_ENV_NAME} -c pytorch pytorch torchvision cudatoolkit=10.2 ipython` – Rahul Bohare Apr 13 '21 at 09:52

2 Answers2

17

I got it working after many, many tries. Posting the answer here in case it helps anyone.

Basically, I installed pytorch and torchvision through pip (from within the conda environment) and rest of the dependencies through conda as usual.

This is how the final Dockerfile looks:

# Use nvidia/cuda image
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04

# set bash as current shell
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]

# install anaconda
RUN apt-get update
RUN apt-get install -y wget bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 git mercurial subversion && \
        apt-get clean
RUN wget --quiet https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh -O ~/anaconda.sh && \
        /bin/bash ~/anaconda.sh -b -p /opt/conda && \
        rm ~/anaconda.sh && \
        ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
        echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
        find /opt/conda/ -follow -type f -name '*.a' -delete && \
        find /opt/conda/ -follow -type f -name '*.js.map' -delete && \
        /opt/conda/bin/conda clean -afy

# set path to conda
ENV PATH /opt/conda/bin:$PATH


# setup conda virtual environment
COPY ./requirements.yaml /tmp/requirements.yaml
RUN conda update conda \
    && conda env create --name camera-seg -f /tmp/requirements.yaml

RUN echo "conda activate camera-seg" >> ~/.bashrc
ENV PATH /opt/conda/envs/camera-seg/bin:$PATH
ENV CONDA_DEFAULT_ENV $camera-seg

And this is how the requirements.yaml looks like:

name: camera-seg
channels:
  - defaults
  - conda-forge
dependencies:
  - python=3.6
  - pip
  - numpy
  - pillow
  - yaml
  - pyyaml
  - matplotlib
  - jupyter
  - notebook
  - tensorboardx
  - tensorboard
  - protobuf
  - tqdm
  - pip:
    - torch
    - torchvision

Then I build the container using the command docker build -t camera-seg . and PyTorch is now being able to recognize CUDA.

Rahul Bohare
  • 762
  • 2
  • 11
  • 31
  • Any thoughts on doing it for Cuda 11.0? I am tempted to try your solution adjusting for 11.0 but unsure if I want to enter the rabbit hole :) – John Curry Apr 10 '21 at 11:59
  • One thing I hadnt realized is that conda installs its own cuda so no need to worry about versioning so much. This solution worked fine for me. – John Curry Apr 10 '21 at 14:43
  • @JohnCurry did you manage to get everything working? You also don't need to install torch and torchvision with pip. It can be installed using conda by having an statement within the Dockerfile. See my comment in the question. For a different cuda version, I am assuming just changing `cudatoolkit=11.0` should work along with a change in the very first line in the Dockerfile: `FROM nvidia/cuda:11.0-cudnn.....` . I have not tried it myself though. – Rahul Bohare Apr 13 '21 at 09:55
  • yes all works fine. I didn't change to 11.0 in the end as the code I'm using requires 10.2 in any case. Really useful image, thanks! – John Curry Apr 14 '21 at 09:53
4

I managed to set it up with the following Dockerfile:

FROM nvidia/cuda:11.3.1-devel-ubuntu20.04
ENV TZ=Europe/Brussels

RUN apt-get update --fix-missing && DEBIAN_FRONTEND=noninteractive apt-get install --assume-yes --no-install-recommends \
   build-essential \
   python3 \
   python3-dev \
   python3-pip

RUN pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

I made sure the cuda version is the same as installed on the machine where the docker container would be running.

Then I did docker build and run as follows:

$ docker build . -t docker-example:latest
$ docker run --gpus all --interactive --tty docker-example:latest

Inside the docker container, inside a python shell, torch.cuda.is_available() would then return True.

nim.py
  • 467
  • 1
  • 6
  • 19