13

The official PyTorch Docker image is based on nvidia/cuda, which is able to run on Docker CE, without any GPU. It can also run on nvidia-docker, I presume with CUDA support enabled. Is it possible to run nvidia-docker itself on an x86 CPU, without any GPU? Is there a way to build a single Docker image that takes advantage of CUDA support when it is available (e.g. when running inside nvidia-docker) and uses the CPU otherwise? What happens when you use torch.cuda from inside Docker CE? What exactly is the difference between Docker CE and why can't nvidia-docker be merged into Docker CE?

breandan
  • 1,965
  • 26
  • 45

1 Answers1

17

nvidia-docker is a shortcut for docker --runtime nvidia. I do hope they merge it one day, but for now it's a 3rd party runtime. They explain what it is and what it does on their GitHub page.

A modified version of runc adding a custom pre-start hook to all containers. If environment variable NVIDIA_VISIBLE_DEVICES is set in the OCI spec, the hook will configure GPU access for the container by leveraging nvidia-container-cli from project libnvidia-container.

Nothing stops you from running images meant for nvidia-docker with normal docker. They work just fine but if you run something in them that requires the GPU, that will fail.

I don't think you can run nvidia-docker on a machine without a GPU. It won't be able to find the CUDA files it's looking for and will error out.

To create an image that can run on both docker and nvidia-docker, your program inside it needs to be able to know where it's running. I am not sure if there's an official way, but you can try one of the following:

  • Check if nvidia-smi is available
  • Check if the directory specified in $CUDA_LIB_PATH exists
  • Check if your program can load the CUDA libraries successfully, and if it can't just fallback
kichik
  • 33,220
  • 7
  • 94
  • 114