I built a docker image based on nvidia/cuda:11.3.1-cudnn8-runtime-ubuntu20.04 my Dockerfile is like this:
ARG CUDA_VERSION=11.3.1
FROM nvidia/cuda:${CUDA_VERSION}-cudnn8-runtime-ubuntu20.04
ARG PYTORCH_VERSION=1.12.1
# Set a docker label to enable container to use SAGEMAKER_BIND_TO_PORT environment variable if present
LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true
LABEL maintainer="Change Healthcare"
LABEL dlc_major_version="1"
ENV PATH /opt/conda/bin:$PATH
RUN rm /etc/apt/sources.list.d/*
RUN apt-get update
RUN apt-get install -y curl wget
RUN curl -L -o ~/miniconda.sh https://repo.continuum.io/miniconda/Miniconda3-py38_23.1.0-1-Linux-x86_64.sh
RUN chmod +x ~/miniconda.sh
RUN ~/miniconda.sh -b -p /opt/conda
RUN rm ~/miniconda.sh
RUN /opt/conda/bin/conda install -y ruamel_yaml==0.15.100 cython botocore mkl-include mkl
RUN /opt/conda/bin/conda clean -ya
RUN pip install --upgrade pip --trusted-host pypi.org --trusted-host files.pythonhosted.org
RUN ln -s /opt/conda/bin/pip /usr/local/bin/pip
RUN ln -s /opt/conda/bin/pip /usr/local/bin/pip3
RUN ln -s /opt/conda/bin/python /usr/local/bin/python
RUN pip install packaging==20.4 enum-compat==0.0.3
# Conda installs links for libtinfo.so.6 and libtinfo.so.6.2 both
# Which causes "/opt/conda/lib/libtinfo.so.6: no version information available" warning
# Removing link for libtinfo.so.6. This change is needed only for ubuntu 20.04-conda, and can be reverted
# once conda fixes the issue: https://github.com/conda/conda/issues/9680
RUN rm -rf /opt/conda/lib/libtinfo.so.6
WORKDIR /
RUN cd tmp/ \
&& rm -rf tmp*
# Uninstall and re-install torch and torchvision from the PyTorch website
RUN pip uninstall -y torch
RUN /opt/conda/bin/conda install pytorch==${PYTORCH_VERSION} cudatoolkit=11.3 -c pytorch
I start a container based on this image, and in the container, I ran the commands
import torch
torch.cuda.is_available()
it returns False.
If I build an image based on nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04
import torch
torch.cuda.is_available()
returns True
But devel image is much larger than runtime image and I want to use runtime as base image. Can anyone help me figure out how to let pytorch find GPU using nvidia/cuda:11.3.1-cudnn8-runtime-ubuntu20.04 as base image?
Regards, Arthur