So we have a medium-sized JVM-based application and since a week or two it's being OOM-killed regularly by docker. I have read everything I could find on Java 8 memory consumption in containers, the experimental cgroup
flag, MaxRAM, controlling non-heap size, optimizing the GC and so on. But there is no way to get the JVM to throw its own OOM exception in our case. It's always docker that's killing it with code 137.
E.g. when giving 2000M
of memory to the container and setting the heap to 80%
of that:
-XX:MaxRAM=1600M -XX:MaxRAMFraction=2
which means the heap will grow up to 800M, the result is still an OOM-kill by docker. We started out with -Xmx
between 800M
and 1600M
- same result.
When controlling the non-heap size (assuming a max of 100 threads):
-XX:MaxRAM=1050M -XX:MaxRAMFraction=1 -Xss1M -XX:ReservedCodeCacheSize=128M -XX:MaxDirectMemorySize=64M -XX:CompressedClassSpaceSize=128M -XX:MaxMetaspaceSize=128M
and arriving at (100 * Xss) + 128M + 64M + 128M + 128M = 548M
for the entire non-heap part of JVM memory requirements, we take the 2000M
of container memory minus a margin of 20%
minus the 548M
non-heap giving us -XX:MaxRAM=1050M
and still we get OOM-killed.
Not sure if it matters but we run a DC/OS cluster and it's Marathon reporting the task kills due to OOM. But my understanding is that it's the underlying docker engine's behaviour that gets reported.