1

We are running a JDK17 spring-boot application on our production server with following configuration:

  • JDK Vendor : Amazon corretto (17.0.6)
  • K8S version : 1.17
  • Max pod memory : 5GB
  • Min pod memory : 5GB
  • Xmx : 2GB
  • Xms : 2GB

The problem we are running into is that in every 24hr, the application is getting OOM Killed by K8S (exit code 137). Few observations till now:

  • There is no leak in heap memory, it's confired from multiple heap dumps and gc logs as well.
  • No leak is observed in native memory as well with native memory dump. The max reserved seen in native memory dump is 3.5GB. Dump has been taken in periodic intervals, no increase in any non-heap area is observed.
  • We have checked that RSS size of the application increases gradually to 5GB when the OOM kill happens.

We have tried to tweak around XMX/XMS and few other GC parameters (disabling adaptive IHOP and all), but nothing helped till now. Possibly there is some leak and same is visible in growing RSS, but same is not reflecting on native dump.

  • For nodes running Linux Kernel 5.0 or later (ex. Ubuntu 18.04.3) there is a known [issue](https://b.corp.google.com/issues/150815439#comment12) where **kubelet falsely reports pod cgroup OOM as a system OOM**. To verify whether it was a false positive, check the node's kernel log for '**oom-kill**' messages. If you see '**memcg=/kubepods/**' then it is a standard container OOM and can be disregarded as a system OOM. The [fix](https://github.com/google/cadvisor/pull/2817) is expected to be implemented in K8s 1.21. – Veera Nagireddy Aug 14 '23 at 07:50
  • Apart from above comment , please go throught this Article [Java Container crashes with “Error 137 (out of memory)”](https://blogs.sap.com/2022/10/10/java-container-crashes-with-error-137-out-of-memory/), which may help to resolve your issue. – Veera Nagireddy Aug 18 '23 at 06:20

1 Answers1

0

Xmx will control the maximum heap size - but that is not the only memory region a JVM manages (see e.g. here or here). If memory dumps do not reveal which memory region is growing that much, consider a bug in the JVM itself.

To verify, tweak more parameters or switch to another JVM implementation.

Queeg
  • 7,748
  • 1
  • 16
  • 42
  • I am not able to figure out why RSS is increasing if nothing unusual shows up either in native/heap dump. How do I figure what else is hogging process memory? – Musaddique Hossain Aug 12 '23 at 12:04
  • If a memory dump do not reveal more information, consider the JVM to be buggy. – Queeg Aug 12 '23 at 15:31