1

Currently we have a set of microservice hosted on kubernetes cluster. We are setting hpa values based on rough estimates. I am planning to monitor horizontal pod autoscaling behavior using grafana to ensure we are not over/under allocating the resources like CPU/memory and come up with possible cost optimization recommendation. Need directions on how to achieve this.

I am new to Kubernetes world. Need directions on how to achieve this.

1 Answers1

1

tl;dr

  1. Monitor resource consumption of each pod.
  2. Monitor pod restarts and number of replicas.
  3. Use a load test.

Memory

As a starting point, you could monitor CPU- and memory-consumption of each pod. For example you can do something like this:

sum by (pod) (container_memory_usage_bytes{container=...}/
sum by (pod) (kube_pod_container_resource_requests{container=...})

If you follow the advice given in A Practical Guide to Setting Kubernetes Requests and Limits, the limit setting is related to the request setting. With such a query you can analyse, if the requested memory per pod is roughly realistic. Depending on the configuration of the autoscaler, this could be helpful. You could define some grafana alert rule that triggers an alarm if the desired ratio between used and requested memory exceeds some threshold.

Restarts

If the pod exceeds a given memory limit, the pod will crash and kubernetes will trigger a restart. With the following metric you can monitor restarts:

sum by (pod) (increase(kube_pod_container_status_restarts_total{...}[1h]))

CPU

CPU usage is also relevant:

process_cpu_usage{container="..."}

For additional queries, have a look at Prometheus queries to get CPU and Memory usage in kubernetes pods.

Replicas

Now, as you have basic metrics in place, what about the autoscaler itself? You'll be able to count the number of active pods like this:

kube_horizontalpodautoscaler_status_current_replicas{}

Note that you might need to filter this metric by label horizontalpodautoscaler. But I recommend that you first run the metric without filters to get information about all running autoscalers.

To have better cost control, autoscaling is usually limited to a maximum of replicas. If you are running on maximum, you might want to check if the given maximum is to low. With kubectl you can check the status like this:

kubectl describe hpa

Have a look at condition ScalingLimited.

With grafana:

kube_horizontalpodautoscaler_status_condition{condition="ScalingLimited"}

A list of kubernetes metrics can be found at kube-state-metrics. Have a look at Horizontal Pod Autoscaler Metrics and ReplicationController metrics.

Use a load test

In the HorizontalPodAutoscaler Walkthrough there is a point where you need to increase the load on your application. There are several tools, that you may use for this, such as Apache Bench or JMeter.

In my experience, upscaling is easy to achieve, the tricky part is the downscaling. Therefore, you need to play with increasing and decreasing the load.