1

I have 2 Slave and 1 Master node kubernetes cluster.When a node down it takes approximately 5 minutes to kubernetes see that failure.I am using dynamic provisioning for volumes and this time is a little bit much for me.How can i reduce that detecting failure time ? I found a post about it: https://fatalfailure.wordpress.com/2016/06/10/improving-kubernetes-reliability-quicker-detection-of-a-node-down/

At the bottom of the post,it says, we can reduce that detection time by changing that parameters:

kubelet: node-status-update-frequency=4s (from 10s)
controller-manager: node-monitor-period=2s (from 5s)
controller-manager: node-monitor-grace-period=16s (from 40s)
controller-manager: pod-eviction-timeout=30s (from 5m)

i can change node-status-update-frequency parameter from kubelet but i don't have any controller manager program or command on the cli.How can i change that parameters? Any other suggestions about reducing detect downtime will be appreciated.

Adi Soyadi
  • 23
  • 6

2 Answers2

2

..but i don't have any controller manager program or command on the cli.How can i change that parameters?

You can change/add that parameter in controller-manger systemd unit file and restart the daemon. Please check the man pages for controller-manager here.

If you deploy controller-manager as micro service(pod), check the manifest file for that pod and change the parameters at container's command section(For example like this)

Veerendra K
  • 2,145
  • 7
  • 32
  • 61
  • There is a manifest file could be about it: /etc/kubernetes/manifests/kube-controller-manager.yaml Can i add that flags and apply that manifest file ? Is it useful ? kubectl apply -f kube-controller-manager.yaml – Adi Soyadi Apr 22 '19 at 11:17
  • Yes, you can modify that manifest. Probably you will need to restart kubelet after that. – Vasili Angapov Apr 22 '19 at 11:23
  • Unfortunately manifest file gives crashloopbackoff.I tried /etc/systemd/system/kubelet.service.d/10-kubeadm.conf also but it does not give any effect. when i give describe command it shows nothing: Back-off restarting failed container – Adi Soyadi Apr 22 '19 at 16:38
  • @AdiSoyadi, what does it saying? I didnt remember exactly, can you check how those pods are deployed i.e `replicaset` or `deamonset` in `kube-system` namespace. Then open the manifest file for `replicaset`/`daemonset` and edit. – Veerendra K Apr 23 '19 at 06:52
0

It's actually kube-controller-manager. You may also decrease --attach-detach-reconcile-sync-period from 1m to 15 or 30 seconds for kube-controller-manager. This will allow for more speedy volumes attach-detach actions. How you change those parameters depends on how you set up the cluster.

Vasili Angapov
  • 8,061
  • 15
  • 31
  • Thanks for reply.My actual problem is i cannot find any documentation about kube-controller-manager and i do not know how to setup and how to use it.My cluster:2slavex1master on premise cluster.(virtualbox) – Adi Soyadi Apr 22 '19 at 11:20
  • Hi, could you please edit /etc/kubernetes/manifests/kube-controller-manager.yaml and add necessary flags as described by community @Veerendra [here](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/) – Mark Apr 30 '19 at 13:35