I have a Jenkins deployment pipeline which involves kubernetes plugin. Using kubernetes plugin I create a slave pod for building a node application using yarn. The requests and limits for CPU and Memory are set.
When the Jenkins master schedules the slave, sometimes (as I haven’t seen a pattern, as of now), the pod makes the entire node unreachable and changes the status of node to be Unknown. On careful inspection in Grafana, the CPU and Memory Resources seem to be well within the range with no visible spike. The only spike that occurs is with the Disk I/O, which peaks to ~ 4 MiB.
I am not sure if that is the reason for the node unable to address itself as a cluster member. I would be needing help in a few things here:
a) How to diagnose in depth the reasons for node leaving the cluster.
b) If, the reason is Disk IOPS, is there any default requests, limits for IOPS at Kubernetes level?
PS: I am using EBS (gp2)