Accessing cuda from pytorch installation inside a Docker container

Question

I encountered a few related questions on SO but there didn't seem to be a sufficient answer.

I'm trying to access GPU/CUDA from a docker container that has all the necessary prerequisites installed. I have not managed to get torch.cuda.is_available() to return True, although I've checked through many other offered solutions.

For context, I would like to use Docker to access serverless GPU API setup at runpod. The image I'm currently testing on is a base nvidia cuda 11.8 image. The same problem arose from using a runpod pytorch image. I am running this with --gpus all and I've also tried stuff like the answers in this SO question; I have installed nvidia-container-runtime, tested with nvidia-docker, and it is the case that everything does work locally (torch can see cuda on the host machine).

Some assessments of the situation in the docker container:

cuda.is_available() but cuda.device_count() == 1 driver and nvcc both available in docker image torch.utils.collect_env shows no cuda available but runtime version OK?

I've spent a couple days trying to debug this. I'm wondering where else I should look for resources on this?

Accessing cuda from pytorch installation inside a Docker container

0 Answers0