1

I want to run some gpu workloads on my bare metal k8s cluster. So I have installed the nvidia containerd runtime engine on my cluster. But the cilium cni pods crashes when I make nvidia the default runtime. (I'll post about that some other place)

I'm thinking I should be able to work around this problem by scheduling only the gpu pods on the nvidia runtime and leave runc as the default. Is it possible to specify different runtime engines for different workloads? Is this a good workaround? If so, how do I configure it?

This is how I've install the nvidia drivers and containerd runtime https://docs.nvidia.com/datacenter/cloud-native/kubernetes/install-k8s.html#option-2-installing-kubernetes-using-kubeadm

I found this documentation, but it's a little dry https://kubernetes.io/docs/concepts/containers/runtime-class/

d s
  • 21
  • 3

1 Answers1

1

well... I feel dumb for not reading the docs more closely. Here I am to answer my own question.

  1. create a RuntimeClass like this:
kind: RuntimeClass
apiVersion: node.k8s.io/v1
metadata:
    name: nvidia
handler: nvidia
  1. add runtimeClassName: nvidia to the container spec of any containers that you want to use the nvidia containerd engine.

Thats all. It just works.

d s
  • 21
  • 3