I'm trying to set up a container-optimized OS (COS) on GCE with a GPU, following the instructions at https://cloud.google.com/container-optimized-os/docs/how-to/run-gpus. After creating the VM, it says to ssh in and run cos-extensions install gpu
. That works; you can see during the install it runs nvidia-smi
which prints out the driver version (440.33.01) and connects to the card.
But it installs the nvidia bins and libs in /var/lib/nvidia
, which is mounted as noexec
in this OS (it's very locked down). That means none of the libs or utilities work. And when you mount them to a docker container, they don't work there either; they're still noexec.
The only workaround I've found is to copy the whole /var/lib/nvidia
dir to a tmpfs scratch disk and use it from there.
Am I using it wrong, or is it just broken?