docker build with nvidia runtime

Question

I have a GPU application that does unit-testing during the image building stage. With Docker 19.03, one can specify nvidia runtime with docker run --gpus all but I also need access to the gpus for docker build because I do unit-testing. How can I achieve this goal?

For older version of docker that use nvidia-docker2 it was not possible to specifiy runtime during build stage, BUT you can set the default runtime to be nvidia, and docker build works fine that way. Can I do that in Docker 19.03 that doesn't need nvidia-docker anymore? If so, how?

Related to: https://stackoverflow.com/questions/70157364/docker-make-nvidia-gpus-visible-during-docker-build-process — Evandro Coan, Jul 04 '22 at 14:24

Anton Ganichev · Answer 1 · 2020-06-16T09:10:17.390

60

You need use nvidia-container-runtime as explained in docs: "It is also the only way to have GPU access during docker build".

Steps for Ubuntu:

Install nvidia-container-runtime:

sudo apt-get install nvidia-container-runtime
Edit/create the /etc/docker/daemon.json with content:

{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
         } 
    },
    "default-runtime": "nvidia" 
}

Restart docker daemon:

sudo systemctl restart docker
Build your image (now GPU available during build):

docker build -t my_image_name:latest .

edited Jun 16 '20 at 09:10

answered May 11 '20 at 19:06

Anton Ganichev

2,184
1
18
17

This is due to a syntax error (extra comma in 6-th line). Fixed now. – Anton Ganichev Jun 16 '20 at 09:12
4

FYI if you need this because you want to compile custom kernels with pytorch, you can use the nvidia development base image (e.g. nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04) and set `ENV TORCH_CUDA_ARCH_LIST=Turing `, then you can also build them without having a GPU available during build – RunOrVeith Mar 25 '21 at 09:48
7

So, literally the *only* way to get `build` to use the nvidia runtime (or I guess, any other runtime) is to set it as the *default*? Geez. How is this not an option? – juanpa.arrivillaga Sep 24 '21 at 23:01
If you do this and are still having issues, make sure that docker-buildx-plugins is NOT installed – Mason Aug 08 '23 at 19:05

score 8 · Answer 2 · answered Feb 01 '20 at 04:00

8

A "solution" I found is to first run a base image with the host nvidia drivers mounted on it

docker run -it --rm --gpus ubuntu

And then build my app within the container manually and commit the resulting image. This is not ideal and it would be best to have access to nvidia-smi during the build phase.

answered Feb 01 '20 at 04:00

danny

1,101
1
12
34

How would you create an image out of the result? – Foobar Aug 21 '22 at 09:24
@Foobar you can create an image of a running container by using `docker commit CONTAINER_NAME TAG` from another terminal. – fgoudra Sep 23 '22 at 12:10
He seems to want to `build` the docker image not `run` it. – Cypher Mar 05 '23 at 15:57

score 7 · Answer 3 · answered Mar 03 '23 at 15:34

7

IMPORTANT NOTICE
(in addition to the existing answer)

Currently (march 2023), if you have docker compose installed, just configuring the default runtime may still not be enough.

In addition to configuring the default runtime, you have to disable the default docker build kit, with:

DOCKER_BUILDKIT=0 docker build <blah>

This applies even if you're not using docker compose, but it applies to docker compose as well of course.

See also:

answered Mar 03 '23 at 15:34

Sam De Meyer

2,031
1
25
32

Thanks! you saved me! But WHY does this happen? isn't there any fix for this? – Cypher Mar 05 '23 at 15:03
I think this is a regression between docker 23.x and docker 20.x. https://forums.developer.nvidia.com/t/nvidia-driver-is-not-available-on-latest-docker/246265 Installing 5:20.10.24~3-0~ubuntu-focal resolves the error. As of this writing 5:23.0.4-1~ubuntu.20.04~focal seems to have the problem. Instructions for installing a specific version of docker: https://docs.docker.com/engine/install/ubuntu/ in the "Install Docker Engine". I did have to run a `sudo apt-get purge docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin` to have everything removed properly. – Kevin Vasko Apr 21 '23 at 13:56
2

This didn't help ... building it with DOCKER_BUILDKIT=0 or without it, during docker build the GPU is still not available. (Docker engine version 23.0.5) – Jan Apr 27 '23 at 12:37
same issue with me. I am trying to build deepstream container for the PC but unable build. getting /usr/bin/ld: warning: libcuda.so.1, needed by /opt/nvidia/deepstream/deepstream-6.0/lib/libnvbufsurftransform.so, not found. If I run the image in bash, and build works as I have include --gpu all in the run. would be good to make it automatic than manual. – Paul Aug 14 '23 at 09:56

docker build with nvidia runtime

3 Answers3

Linked