In Kubernetes, we have a liveness probe which periodically checks whether the container is accessible and, kills and spawns a new one otherwise.
We have a Java webapp and in most of the cases, I see that the application becomes unavailable due to memory pressure. We have a liveness probe, but since the health check service call doesn't take much memory, it succeeds even though a lot of other requests which require more memory linger on.
The GC keeps on running continuously to reclaim the memory but to no avail. The instance never recovers. In such a state, I would like Kubernetes to kill the pod, but given that liveness probe still succeeds, it doesn't. One way to handle this could be to make liveness probe a more resource intensive operation, but then, it would consume more cycles and put additional load on the system.
So, I would like to have some kind of a liveness check which monitors the slope of the graph of Garbage collection counts of the Java process. Another way to state the same is that I want my liveness probe to depend upon telemetry data. Is there any way to achieve that?