4

We currently have problems with a java native memory leak. Server is quite big (40cpus, 128GB of memory). Java heap size is 64G and we run a very memory intensive application reading lot of data to strings with about 400 threads and throwing them away from memory after some minutes.

So the heap is filling up very fast but stuff on the heap becomes obsolete and can be GCed very fast, too. So we have to use G1 to not have STW breaks for minutes.

Now, that seems to work fine - heap is big enough to run the application for days, nothing leaking here. Anyway the Java process is growing and growing over time until all the 128G are used and the aplication crashes with an allocation failure.

I've read a lot about native java memory leaks, including the glibc issue with max. arenas (we have wheezy with glibc 2.13, so no fix possible here with setting MALLOC_ARENA_MAX=1 or 4 without a dist upgrade).

So we tried jemalloc what gave us graphs for:

inuse-space: inuse-space and

inuse-objects: inuse-objects.

I don't get it what's the issue here, has someone an idea?

If I set MALLOC_CONF="narenas:1" for jemalloc as environment parameter for the tomcat process running our app, could that still use the glibc malloc version anyway somehow?

This is our G1 setup, maybe some issue here?

-XX:+UseCompressedOops
-XX:+UseNUMA
-XX:NewSize=6000m
-XX:MaxNewSize=6000m
-XX:NewRatio=3
-XX:SurvivorRatio=1
-XX:InitiatingHeapOccupancyPercent=55
-XX:MaxGCPauseMillis=1000
-XX:PermSize=64m
-XX:MaxPermSize=128m
-XX:+PrintCommandLineFlags
-XX:+PrintFlagsFinal
-XX:+PrintGC
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintTenuringDistribution
-XX:-UseAdaptiveSizePolicy
-XX:+UseG1GC
-XX:MaxDirectMemorySize=2g
-Xms65536m
-Xmx65536m

Thanks for your help!

TT.
  • 15,774
  • 6
  • 47
  • 88
  • It would be really hard to analyze this issue, by looking at it with memory optic. I would advice to monitor the process with tools such as Dynatrace to get a better understanding of memory allocation, GC throughput and CPU utilization. – Armaiti Aug 04 '17 at 17:35
  • I have today isolated a G1 native memory leak that only occurred under heavy load and with frequent System.gc() usage. I used XCode on OS X to first see that the leaked data in the native heap was millions and millions of 32-byte mallocs, and I then used dtrace to determine that vast majority of those 32-byte mallocs originated from G1. – Reuben Scratton Aug 30 '17 at 16:39

1 Answers1

0

We never called System.gc() explicitly, and meanwhile stopped using G1, not specifying anything other than xms and xmx.

Therefore using nearly all the 128G for the heap now. The java process memory usage is high - but constant for weeks. I'm sure this is some G1 or at least general GC issue. The only disadvantage by this "solution" are high GC pauses, but they decreased from up to 90s to about 1-5s with increasing the heap, which is ok for the benchmark we drive with our servers.

Before that, I played around with -XX:ParallelGcThreads options which had significant influence on the memory leak speed when decreasing from 28 (default for 40 cpus) downwards to 1. The memory graphs looked somewhat like a hand fan using different values on different instances...

enter image description here