31

Is there any way to configure nodeSelector at the namespace level?

I want to run a workload only on certain nodes for this namespace.

Oliver
  • 11,857
  • 2
  • 36
  • 42
kvaps
  • 2,589
  • 1
  • 21
  • 20

3 Answers3

52

To achieve this you can use PodNodeSelector admission controller.

First, you need to enable it in your kubernetes-apiserver:

  • Edit /etc/kubernetes/manifests/kube-apiserver.yaml:
    • find --enable-admission-plugins=
    • add PodNodeSelector parameter

Now, you can specify scheduler.alpha.kubernetes.io/node-selector option in annotations for your namespace, example:

apiVersion: v1
kind: Namespace
metadata:
 name: your-namespace
 annotations:
   scheduler.alpha.kubernetes.io/node-selector: env=test
spec: {}
status: {}

After these steps, all the pods created in this namespace will have this section automatically added:

nodeSelector
  env: test

More information about the PodNodeSelector you can find in the official Kubernetes documentation: https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#podnodeselector


kubeadm users

If you deployed your cluster using kubeadm and if you want to make this configuration persistent, you have to update your kubeadm config file:

kubectl edit cm -n kube-system kubeadm-config

specify extraArgs with custom values under apiServer section:

apiServer: 
  extraArgs: 
    enable-admission-plugins: NodeRestriction,PodNodeSelector

then update your kube-apiserver static manifest on all control-plane nodes:

# Kubernetes 1.22 and forward:
kubectl get configmap -n kube-system kubeadm-config -o=jsonpath="{.data}" > kubeadm-config.yaml

# Before Kubernetes 1.22:
# "kubeadmin config view" was deprecated in 1.19 and removed in 1.22
# Reference: https://github.com/kubernetes/kubeadm/issues/2203
kubeadm config view > kubeadm-config.yaml

# Update the manifest with the file generated by any of the above lines 
kubeadm init phase control-plane apiserver --config kubeadm-config.yaml

kubespray users

You can just use kube_apiserver_enable_admission_plugins variable for your api-server configuration variables:

 kube_apiserver_enable_admission_plugins:
   - PodNodeSelector
sisve
  • 19,501
  • 3
  • 53
  • 95
kvaps
  • 2,589
  • 1
  • 21
  • 20
  • I've ssh to one of my nodes but at that location /etc/kubernetes/manifests only a file named kube-proxy.manifest is located, there is no kube-apiserver.yaml. Do I need to create it ? – jmhostalet Oct 30 '19 at 11:36
  • I believe you need access to the kubernetes master, it's not a feature for a node. – Kevin Dec 05 '19 at 21:21
  • 3
    how can we achieve this in managed kubernetes clusters like EKS ? – Aziz Zoaib Jun 02 '20 at 12:50
  • 1
    When running `kubeadm init phase control-plane apiserver --config kubeadm-config.yaml` I get `invalid configuration for GroupVersionKind /v1, Kind=ConfigMap: kind and apiVersion is mandatory information that must be specified` – Josh Woodcock May 05 '21 at 21:00
  • For what I understand, this solution does make a namespace's pod schedule to nodes that fit, but does it prevent pods from another namespace to schedule on these nodes? Can tolerations be configured on namespace? – dotslashlu Aug 26 '22 at 01:51
  • 1
    @dotslashlu No, it doesn't. for such an effect you need to use [PodTolerationRestriction](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#podtolerationrestriction) – hnaderi Oct 29 '22 at 06:15
  • As of 23 Jan 2023, on Kubernetes `v1.24.6`, It seems the `PodNodeSelector` is automatically enabled since I didn't have to run through that configuration, on AKS. – Constantin Jan 23 '23 at 20:00
4

I totally agree with the @kvaps answer but something is missing : it is necessary to add a label in your node :

kubectl label node <yournode> env=test

Like that, the pod created in the namespace with scheduler.alpha.kubernetes.io/node-selector: env=test will be schedulable only on node with env=test label

Nicolas Pepinster
  • 5,413
  • 2
  • 30
  • 48
  • 1
    it depends. if you are perhaps using multiple nodepools, then each one already comes with a unique label which is trickled down to the nodes, so no need to label individually – Mario Jacobo Mar 11 '21 at 14:45
  • 1
    adding labels this way in not recommended for clusters with autoscalers, if the node goes down the replacement will not have this label, it's best to define labels on node pools like @MarioJacobo mentioned. – Bart C Jan 19 '22 at 09:49
3

To dedicate nodes to only host resources belonging to a namespace, you also have to prevent the scheduling of other resources over those nodes.

It can be achieved by a combination of podSelector and a taint, injected via the admission controller when you create resources in the namespace. In this way, you don't have to manually label and add tolerations to each resource but it is sufficient to create them in the namespace.

Properties objectives:

  • the podSelector forces scheduling of resources only on the selected nodes
  • the taint denies scheduling of any resource not in the namespace on the selected nodes

Configuration of nodes/node pool

Add a taint to the nodes you want to dedicate to the namespace:

kubectl taint nodes project.example.com/GPUsNodePool=true:NoSchedule -l=nodesWithGPU=true

This example adds the taint to the nodes that already have the label nodesWithGPU=true. You can taint nodes also individually by name: kubectl taint node my-node-name project.example.com/GPUsNodePool=true:NoSchedule

Add a label:

kubectl label nodes project.example.com/GPUsNodePool=true -l=nodesWithGPU=true

The same is done if, for example, you use Terraform and AKS. The node pool configuration:

resource "azurerm_kubernetes_cluster_node_pool" "GPUs_node_pool" {
   name                  = "gpusnp"
   kubernetes_cluster_id = azurerm_kubernetes_cluster.clustern_name.id
   vm_size               = "Standard_NC12" # https://azureprice.net/vm/Standard_NC12
   node_taints = [
       "project.example.com/GPUsNodePool=true:NoSchedule"
   ]
   node_labels = {
       "project.example.com/GPUsNodePool" = "true"
   }
   node_count = 2
}

Namespace creation

Create then the namespace with instructions for the admission controller:

apiVersion: v1
kind: Namespace
metadata:
  name: gpu-namespace
  annotations:
    scheduler.alpha.kubernetes.io/node-selector: "project.example.com/GPUsNodePool=true"  # poorly documented: format has to be of "selector-label=label-val"
    scheduler.alpha.kubernetes.io/defaultTolerations: '[{"operator": "Equal", "value": "true", "effect": "NoSchedule", "key": "project.example.com/GPUsNodePool"}]'
    project.example.com/description: 'This namespace is dedicated only to resources that need a GPU.' 

Done! Create resources in the namespace and the admission controller together with the scheduler will do the rest.


Testing

Create a sample pod with no label or toleration but into the namespace:

kubectl run test-dedicated-ns --image=nginx --namespace=gpu-namespace

# get nodes and nodes
kubectl get po -n gpu-namespace

# get node name 
kubectl get po test-dedicated-ns -n gpu-namespace -o jsonpath='{.spec.nodeName}'

# check running pods on a node
kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=<node>
nyxgear
  • 99
  • 1
  • 5