How to gracefully remove a node from Kubernetes?

Question

I want to scale up/down the number of machines to increase/decrease the number of nodes in my Kubernetes cluster. When I add one machine, I’m able to successfully register it with Kubernetes; therefore, a new node is created as expected. However, it is not clear to me how to smoothly shut down the machine later. A good workflow would be:

Mark the node related to the machine that I am going to shut down as unschedulable;
Start the pod(s) that is running in the node in other node(s);
Gracefully delete the pod(s) that is running in the node;
Delete the node.

If I understood correctly, even kubectl drain (discussion) doesn't do what I expect since it doesn’t start the pods before deleting them (it relies on a replication controller to start the pods afterwards which may cause downtime). Am I missing something?

How should I properly shutdown a machine?

As I understand it, If you are not running your pod for high-availability (that is several replicas per pod) you should not expect no downtime if your pod goes down. This is not specific to node removal scenario but to any and all scenarios when a pod get rescheduled to a different node. If you do not have HA and running a single replica you will get downtime. — Andrew Savinykh, Feb 17 '19 at 21:35

Amit Thawait · Answer 1 · 2020-02-21T06:47:11.657

178

List the nodes and get the <node-name> you want to drain or (remove from cluster)

kubectl get nodes

1) First drain the node

kubectl drain <node-name>

You might have to ignore daemonsets and local-data in the machine

kubectl drain <node-name> --ignore-daemonsets --delete-local-data

2) Edit instance group for nodes (Only if you are using kops)

kops edit ig nodes

Set the MIN and MAX size to whatever it is -1 Just save the file (nothing extra to be done)

You still might see some pods in the drained node that are related to daemonsets like networking plugin, fluentd for logs, kubedns/coredns etc

3) Finally delete the node

kubectl delete node <node-name>

4) Commit the state for KOPS in s3: (Only if you are using kops)

kops update cluster --yes

OR (if you are using kubeadm)

If you are using kubeadm and would like to reset the machine to a state which was there before running kubeadm join then run

kubeadm reset

edited Feb 21 '20 at 06:47

answered Jan 16 '19 at 15:54

Amit Thawait

4,862
2
31
25

You need to execute `kops update cluster --yes` in order to commit the changes – Marcio Klepacz Mar 08 '19 at 13:31
Yes..I missed that. Thanks @MarcioKlepacz – Amit Thawait Apr 02 '19 at 05:45
run 'sudo kubeadm' reset on the worker that you removed, in case you want to add it back – Alex Punnen Aug 01 '19 at 10:40
this answer works in the sense that it removes the node gracefully, however what happens when you run `kops validate cluster` afterwards? – Rich Jan 28 '20 at 09:03

score 44 · Answer 2 · edited Dec 19 '21 at 22:54

44

Find the node with kubectl get nodes. We’ll assume the name of node to be removed is “mynode”, replace that going forward with the actual node name.
Drain it with kubectl drain mynode
Delete it with kubectl delete node mynode
If using kubeadm, run on “mynode” itself kubeadm reset

edited Dec 19 '21 at 22:54

Andrew Marshall

95,083
20
220
214

answered Feb 05 '20 at 06:33

F. Kam

626
5
9

score 6 · Answer 3 · answered Mar 02 '16 at 22:16

6

Rafael. kubectl drain does work as you describe. There is some downtime, just as if the machine crashed.

Can you describe your setup? How many replicas do you have, and are you provisioned such that you can't handle any downtime of a single replica?

answered Mar 02 '16 at 22:16

Matt Liggett

61
1

1

Currently, I’m evaluating Kubernetes in a separate environment where I have a bunch of EC2 instances with different applications/pods. I understand that I shouldn’t run applications as single replicas. But I’m also not going to run an enormous number of replicas for each application; therefore, loosing one replica would impact the overall capacity of the application. Although I can live with that eventually, I don’t think it is a reasonable approach if it is caused by a planned action such as scaling down the number of machines (nodes). – Rafael Mar 03 '16 at 14:34
1

It turns out that Kubernetes will do this properly, as long as you set an appropriate grace period, have a readinessProbe, and handle SIGTERM properly. https://github.com/kubernetes/kubernetes/issues/20473 covers a similar issue (from the perspective of a rolling update). Let me know if you need more specifics and I'll be happy to help. – Matt Liggett Mar 08 '16 at 00:22
5

Instead of using drain, we are marking the node as unschedulable, getting the list of its deployments and forcing a re-deploy on each of its deployments (by changing an annotation). Since the node is unschedulable, Kubernetes allocates the pods in other nodes automatically. This solution was easier for us. – Rafael Oct 13 '16 at 19:50
This answer covers points 1-3. But what about point 4? How do you delete the node from k8s's list of nodes? – zaTricky Mar 02 '18 at 09:05
1

Found an answer, though I don't see it listed in --help output: kubectl delete node – zaTricky Mar 02 '18 at 09:09

score 4 · Answer 4 · edited Jul 13 '23 at 02:39

4

Follow these steps to remove the worker node from Kubernetes:

List all the nodes from the cluster

kubectl get nodes

Drain node in preparation for maintenance

kubectl drain <node-name> --ignore-daemonsets

Delete node by its name

kubectl delete node <node-name>

edited Jul 13 '23 at 02:39

Péter Szilvási

362
4
17

answered Aug 05 '21 at 10:52

Abhishek

41
3

score 3 · Answer 5 · answered Oct 10 '20 at 15:43

If the cluster is created by kops

1.kubectl drain <node-name>
now all the pods will be evicted

ignore daemeondet:
2.kubectl drain <node-name> --ignore-daemonsets --delete-local-data

3.kops edit ig  nodes-3  --state=s3://bucketname

set max and min value of instance group to 0

4. kubectl delete node

5. kops update cluster --state=s3://bucketname  --yes

Rolling update if required:

6. kops rolling-update cluster  --state=s3://bucketname  --yes

validate cluster:

7.kops validate cluster --state=s3://bucketname

Now the instance will be terminated.

Ricardo Cardona Ramirez · Answer 6 · 2021-11-12T13:37:07.907

When draining a node we can have the risk that the nodes remain unbalanced and that some processes suffer downtime. The purpose of this method is to maintain the load balance between nodes as much as possible in addition to avoiding downtime.

# Mark the node as unschedulable.
echo Mark the node as unschedulable $NODENAME
kubectl cordon $NODENAME

# Get the list of namespaces running on the node.
NAMESPACES=$(kubectl get pods --all-namespaces -o custom-columns=:metadata.namespace --field-selector spec.nodeName=$NODENAME | sort -u | sed -e "/^ *$/d")

# forcing a rollout on each of its deployments. 
# Since the node is unschedulable, Kubernetes allocates 
# the pods in other nodes automatically. 
for NAMESPACE in $NAMESPACES
do
    echo deployment restart for $NAMESPACE
    kubectl rollout restart deployment/name -n $NAMESPACE
done

# Wait for deployments rollouts to finish.
for NAMESPACE in $NAMESPACES
do
    echo deployment status for $NAMESPACE
    kubectl rollout status deployment/name -n $NAMESPACE
done

# Drain node to be removed
kubectl drain $NODENAME

score 3 · Answer 7 · answered Aug 03 '22 at 23:18

The below command only works if you have a lot of replicas, disruption budgets, etc. - but helps a lot with improving cluster utilization. In our cluster we have integration tests kicked off throughout the day (pods run for an hour and then spin down) as well as some dev-workload (runs for a few days until a dev spins it down manually). I am running this every night and get from ~100 nodes in the cluster down to ~20 - which adds up to a fair amount of savings:

for node in $(kubectl get nodes -o name| cut -d "/" -f2); do
  kubectl drain --ignore-daemonsets --delete-emptydir-data $node;
  kubectl delete node $node;
done

score 0 · Answer 8 · answered Feb 03 '21 at 14:10

There exists some strange behaviors for me when kubectl drain. Here are my extra steps, otherwise DATA WILL LOST in my case!

Short answer: CHECK THAT no PersistentVolume is mounted to this node. If have some PV, see the following descriptions to remove it.

When executing kubectl drain, I noticed, some Pods are not evicted (they just did not appear in those logs like evicting pod xxx).

In my case, some are pods with soft anti-affinity (so they do not like to go to the remaining nodes), some are pods of StatefulSet of size 1 and wants to keep at least 1 pod.

If I directly delete that node (using the commands mentioned in other answers), data will get lost because those pods have some PersistentVolumes, and deleting a Node will also delete PersistentVolumes (if using some cloud providers).

Thus, please manually delete those pods one by one. After deleted, kuberentes will re-schedule the pods to other nodes (because this node is SchedulingDisabled).

After deleting all pods (excluding DaemonSets), please CHECK THAT no PersistentVolume is mounted to this node.

Then you can safely delete the node itself :)

Does this data loss only apply to local storage types, or also dynamically provisioned network storage? — benjimin, Oct 15 '22 at 16:39

How to gracefully remove a node from Kubernetes?

8 Answers8

Linked