openjdk8-based container never crashes with JVM's own OOM exceptions; always killed by docker with error 137

Question

So we have a medium-sized JVM-based application and since a week or two it's being OOM-killed regularly by docker. I have read everything I could find on Java 8 memory consumption in containers, the experimental cgroup flag, MaxRAM, controlling non-heap size, optimizing the GC and so on. But there is no way to get the JVM to throw its own OOM exception in our case. It's always docker that's killing it with code 137.

E.g. when giving 2000M of memory to the container and setting the heap to 80% of that:

-XX:MaxRAM=1600M -XX:MaxRAMFraction=2

which means the heap will grow up to 800M, the result is still an OOM-kill by docker. We started out with -Xmx between 800M and 1600M - same result.

When controlling the non-heap size (assuming a max of 100 threads):

-XX:MaxRAM=1050M -XX:MaxRAMFraction=1 -Xss1M -XX:ReservedCodeCacheSize=128M -XX:MaxDirectMemorySize=64M -XX:CompressedClassSpaceSize=128M -XX:MaxMetaspaceSize=128M

and arriving at (100 * Xss) + 128M + 64M + 128M + 128M = 548M for the entire non-heap part of JVM memory requirements, we take the 2000M of container memory minus a margin of 20% minus the 548M non-heap giving us -XX:MaxRAM=1050M and still we get OOM-killed.

Not sure if it matters but we run a DC/OS cluster and it's Marathon reporting the task kills due to OOM. But my understanding is that it's the underlying docker engine's behaviour that gets reported.

The formula is not accurate. JVM may take much more RAM, see [this answer](https://stackoverflow.com/questions/53451103/java-using-much-more-memory-than-heap-size-or-size-correctly-docker-memory-limi/53624438#53624438) for details. — apangin, Dec 19 '18 at 14:27
@apangin thanks. Seems the issue is complicated. Are some parts of the formula of your answer's conclusion neither part of the heap nor the non-heap? Yourkit is reporting a very slowly growing heap and non-heap while the container's memory footprint is growing much faster, making me think some other part of the JVM might be responsible — Cpt. Senkfuss, Dec 21 '18 at 16:46
'Non-heap' in YourKit includes only Code Cache, Metaspace and Compressed Class Space. Everything else is not counted. — apangin, Dec 21 '18 at 17:07

score 0 · Answer 1 · answered Dec 25 '18 at 08:18

Please, check the version of the Open JDK 8 you are using. Oracle backported the support for cgroups limits to Open JDK 8u131. Please see this Oracle blog article and a shorter explanation. The latter one has some useful snippets to check if JVM correctly sets heap size inside the container.

If in your case, the JVM sets heap size correctly, I would check for a memory leak in the app.

openjdk8-based container never crashes with JVM's own OOM exceptions; always killed by docker with error 137

1 Answers1