How do I find the driver version of the node in Autopilot?
I need the 525 driver version on the node - but I suspect it's 470.
Is there a way to specify a nodeSelector
to provision nodes with 525 version of the driver?
How do I find the driver version of the node in Autopilot?
I need the 525 driver version on the node - but I suspect it's 470.
Is there a way to specify a nodeSelector
to provision nodes with 525 version of the driver?
In Autopilot clusters, GKE manages the driver version selection and installation, however if you need the list of GPU driver versions associated with GKE version, refer to the corresponding Container-Optimized OS page linked in the GKE current versions table.
For example if you have selected GKE version 1.25.7-gke.1000 the COS version available is cos-101-17162-127-27 and the gpu driver version supported will be v470.182.03(default), v525.105.17
You can follow this documentation for deploying your gpu workloads on autopilot cluster.
Edit 1: The below steps within the lines are meant for standard clusters.
After adding GPU nodes to your cluster, you need to install NVIDIA's device drivers on the nodes. Google provides a DaemonSet that you can apply to install the drivers. On GPU nodes that use Container-Optimized OS images, you also have the option of selecting between the default GPU driver version or a newer version
Note: This content is taken from google cloud official documents which are embedded into the content.