Do I need nvidia-container-runtime, and why?

Question

I want to access my NVIDIA GPUs from inside containers. Can I do this without nvidia-container-runtime?

Requiring a custom Docker runtime just to talk to one device seems very strange. There is a whole universe of PCI devices out there. Why does this one need its own runtime? For example, suppose I had both NVIDIA and AMD GPUs. Would I be unable to access both from inside one container?

I understand that nvidia-container-runtime lets me control which GPUs are visible via NVIDIA_VISIBLE_DEVICES. But I do not care about this. I am not using containers to isolate devices; I am using containers to manage CUDA/CUDNN/TensorFlow version h*ll. And if I did want to isolate devices, I would use the same mechanism as forever: By controlling access to nodes in /dev.

In short, the whole "custom runtime" design looks flawed to me.

So, questions:

What am I missing?
Can I obtain access to my NVIDIA GPUs using the stock Docker (or podman) runtime?
If not, why not?

Robert Crovella · Answer 1 · 2020-09-25T17:40:25.963

I certainly won't be able to answer every conceivable question related to this. I will try to give a summary. Some of what I write here is based on what's documented here and here. My discussion here will also be focused on linux, and docker (not windows, not singularity, not podman, etc.). I'm also not likely to be able to address in detail questions like "why don't other PCI devices have to do this?". I'm also not trying to make my descriptions of how docker works perfectly accurate to an expert in the field.

The NVIDIA GPU driver has components that run in user space and also other components that run in kernel space. These components work together and must be in harmony. This means the kernel mode component(s) for driver XYZ.AB must be used only with user-space components from driver XYZ.AB (not any other version), and vice-versa.

Roughly speaking, docker is a mechanism to provide an isolated user-space linux presence that runs on top of, and interfaces to, the linux kernel (where all the kernel space stuff lives). The linux kernel is in the base machine (outside the container) and much/most of linux user space code is inside the container. This is one of the architectural factors that allow you to do neato things like run an ubuntu container on a RHEL kernel.

From the NVIDIA driver perspective, some of its components need to be installed inside the container and some need to be installed outside the container.

Can I obtain access to my NVIDIA GPUs using the stock Docker (or podman) runtime?

Yes, you can, and this is what people did before nvidia-docker or the nvidia-container-toolkit existed. You need to install the exact same driver in the base machine as well as in the container. Last time I checked, this works (although I don't intend to provide instructions here.) If you do this, the driver components inside the container match those outside the container, and it works.

What am I missing?

NVIDIA (and presumably others) would like a more flexible scenario. The above description means that if a container was built with any other driver version (than the one installed on your base machine) it cannot work. This is inconvenient.

The original purpose of nvidia-docker was to do the following: At container load time, install the runtime components of the driver, which are present in the base machine, into the container. This harmonizes things, and although it does not resolve every compatibility scenario, it resolves a bunch of them. With a simple rule "keep your driver on the base machine updated to the latest" it effectively resolves every compatibility scenario that might arise from a mismatched driver/CUDA runtime. (The CUDA toolkit, and anything that depends on it, like CUDNN, need only be installed in the container.)

As you point out, the nvidia-container-toolkit has picked up a variety of other, presumably useful, functionality over time.

I'm not spending a lot of time here talking about the compatibility strategy ("forward") that exists for compiled CUDA code, and the compatibility strategy ("backward") that exists when talking about a specific driver and the CUDA versions supported by that driver. I'm also not intending to provide instructions for use of the nvidia-container-toolkit, that is already documented, and many questions/answers about it already exist also.

I won't be able to respond to follow up questions like "why was it architected that way?" or "that shouldn't be necessary, why don't you do this?"

This answer is wholly insufficient. I am running these containers rootless and it is working. Code that runs rootless is not a "driver", by definition. I would like to know more about what it is actually doing, but since you are understandably defensive about this awful design, I guess I will figure it out from the source. Thank you for your time. — Nemo, Sep 26 '20 at 16:41
If you want to review the source, you can get an idea [here](https://github.com/NVIDIA/libnvidia-container/blob/master/src/nvc_mount.c) of what are some of the driver "user space components" that I referred to, that are being mounted in the container, at mount time. — Robert Crovella, Sep 26 '20 at 18:07

score 1 · Accepted Answer · answered Oct 12 '20 at 15:59

To answer my own question: No, we do not need nvidia-container-runtime.

The NVIDIA shared libraries are tightly coupled to each point release of the driver. NVIDIA likes to say "the driver has components that run in user space", but of course that is a contradiction in terms. So for any version of the driver, you need to make the corresponding release of these shared libraries accessible inside the container.

A brief word on why this is a bad design: Apart from the extra complexity, the NVIDIA shared libraries have dependencies on other shared libraries in the system, in particular C and X11. If a newer release of the NVIDIA libraries ever required features from newer C or X11 libraries, a system running those newer libraries could never host an older container. (Because the container would not be able to run the newer injected libraries.) The ability to run old containers on new systems is one of the most important features of containers, at least in some applications. I guess we have to hope that never happens.

The HPC community figured this out and made it work some time ago. Here are some old instructions for creating a portable Singularity GPU container which injects the required NVIDIA shared libraries when the container runs. You could easily follow a similar procedure to create a portable OCI or Docker GPU container.

These days, Singularity supports a --nv flag to inject the necessary shared libraries automatically. It also supports a --rocm flag for AMD GPUs. (Yes, AMD chose the same bad design.) Presumably you could combine these flags if you needed both.

All of these details are pretty well-documented in the Singularity manual.

Bottom line: If you are asking the same question I was, try Singularity.

Do I need nvidia-container-runtime, and why?

2 Answers2

Linked