Docker Run Has Access to GPU But Docker Build Doesn't

Question

I'm building a DeepStream Docker image for NVIDIA GPUs as mentioned in this link.

I have the NVIDIA Container Toolkit installed and The original Dockerfile works and after building it I can start a container with GPU support using this command:

sudo docker run --runtime=nvidia --gpus all --name Test -it deepstream:dgpu

The problem is that I want to install PyTorch during the docker build sequence and use it. As soon as PyTorch is imported in the build sequence, the Found no NVIDIA driver on your system error arises:

#0 0.895 Traceback (most recent call last):
#0 0.895   File "./X.py", line 15, in <module>
#0 0.895     dummy_input = torch.randn([1, 3, 224, 224], device='cuda')
#0 0.895   File "/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py", line 229, in _lazy_init
#0 0.895     torch._C._cuda_init()
#0 0.895 RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

I have the proper driver for the docker:

NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6

And I can properly use PyTorch After docker build is done and I've started the container with GPU support.

So it seems that the docker build process does not take the NVIDIA driver or GPUs into account and I can only use the GPUs AFTER the build has been completed. It also seems that there is no --runtime=nvidia --gpus all flags to pass to the docker build command.

How can I fix this problem so that I can use PyTorch & CUDA during the build process?

UPDATE: The problem seems to be because of the BuildKit version as discussed here, here and here. But I haven't still found the exact way to properly fix it (instead of setting DOCKER_BUILDKIT=0).

Docker Run Has Access to GPU But Docker Build Doesn't

0 Answers0