tl;dr
- Monitor resource consumption of each pod.
- Monitor pod restarts and number of replicas.
- Use a load test.
Memory
As a starting point, you could monitor CPU- and memory-consumption of each pod. For example you can do something like this:
sum by (pod) (container_memory_usage_bytes{container=...}/
sum by (pod) (kube_pod_container_resource_requests{container=...})
If you follow the advice given in A Practical Guide to Setting Kubernetes Requests and Limits, the limit setting is related to the request setting. With such a query you can analyse, if the requested memory per pod is roughly realistic. Depending on the configuration of the autoscaler, this could be helpful. You could define some grafana alert rule that triggers an alarm if the desired ratio between used and requested memory exceeds some threshold.
Restarts
If the pod exceeds a given memory limit, the pod will crash and kubernetes will trigger a restart. With the following metric you can monitor restarts:
sum by (pod) (increase(kube_pod_container_status_restarts_total{...}[1h]))
CPU
CPU usage is also relevant:
process_cpu_usage{container="..."}
For additional queries, have a look at Prometheus queries to get CPU and Memory usage in kubernetes pods.
Replicas
Now, as you have basic metrics in place, what about the autoscaler itself? You'll be able to count the number of active pods like this:
kube_horizontalpodautoscaler_status_current_replicas{}
Note that you might need to filter this metric by label horizontalpodautoscaler
. But I recommend that you first run the metric without filters to get information about all running autoscalers.
To have better cost control, autoscaling is usually limited to a maximum of replicas. If you are running on maximum, you might want to check if the given maximum is to low. With kubectl you can check the status like this:
kubectl describe hpa
Have a look at condition ScalingLimited
.
With grafana:
kube_horizontalpodautoscaler_status_condition{condition="ScalingLimited"}
A list of kubernetes metrics can be found at kube-state-metrics. Have a look at Horizontal Pod Autoscaler Metrics and ReplicationController metrics.
Use a load test
In the HorizontalPodAutoscaler Walkthrough there is a point where you need to increase the load on your application. There are several tools, that you may use for this, such as Apache Bench or JMeter.
In my experience, upscaling is easy to achieve, the tricky part is the downscaling. Therefore, you need to play with increasing and decreasing the load.