Resource-utilization based liveness checks in Kubernetes

Question

In Kubernetes, we have a liveness probe which periodically checks whether the container is accessible and, kills and spawns a new one otherwise.

We have a Java webapp and in most of the cases, I see that the application becomes unavailable due to memory pressure. We have a liveness probe, but since the health check service call doesn't take much memory, it succeeds even though a lot of other requests which require more memory linger on.

The GC keeps on running continuously to reclaim the memory but to no avail. The instance never recovers. In such a state, I would like Kubernetes to kill the pod, but given that liveness probe still succeeds, it doesn't. One way to handle this could be to make liveness probe a more resource intensive operation, but then, it would consume more cycles and put additional load on the system.

So, I would like to have some kind of a liveness check which monitors the slope of the graph of Garbage collection counts of the Java process. Another way to state the same is that I want my liveness probe to depend upon telemetry data. Is there any way to achieve that?

score 0 · Answer 1 · answered Jan 04 '19 at 15:08

The health probes are often used in the form of HTTP requests that check the status code returned by the HTTP endpoint. However, you can also execute scripts as health checks and the kubernetes documentation provides an example which does a cat on a file. Instead of doing a cat on a file, you could run a custom script command to check the stat you want (e.g. java heap size). If the script is complex maybe you'd want to include that script in your image or mount it into the container from a configmap. There will be other ways to get metrics other than running bash commands as you could go to the k8s metrics API. Or you could get your java app to report directly with a rest endpoint that you can call to (e.g. something like spring boot actuator).

I should say it seems to me it would be better if memory usage could be made stable or the load balanced across Pods (maybe even using HPA). But presumably you've got your reasons. — Ryan Dawson, Jan 04 '19 at 15:26

Resource-utilization based liveness checks in Kubernetes

1 Answers1