5

I was recently testing a presentation app with a relatively high concurrent load. The application is a java application and is running on Hotspot JVM (1.8.0_111).

I could get a maximum throughput of around 400 TPS with a 4G heap and parallel throughput collector. The throughput chart (as a function of load) is shown below.

enter image description here

Because Oracle suggests using G1GC for heap sizes larger than 4G, I wanted to try G1 and see if this benefits my application throughput in any way.

To my surprise, with G1GC, I saw the below throughput trend.

enter image description here

I was genuinely surprised and decided to dig in to it to see what happened here. This is what I found.

enter image description here

I see that initially, out of the 4G heap, 1.5G is allocated to old gen regions and 2.5G to the eden regions. But as time progresses and old gen can no longer fit in 1.5G, heap gets resized. This seems innocuous. But the problem seems to be in the way resizing happens.

All of 4G is now allocated to old gen regions and none, almost, to the eden regions. Now, when something needs to be allocated to eden, heap gets resized again. And this becomes the new normal, where heap gets resized repeatedly resulting in a huge performance cost to the application.

Has anyone noticed this before with G1GC? Are there any recommendations to negotiate this issue?

Below is the startup command line with the JVM options.

java -server -Xms4096m -Xmx4096m -XX:MetaspaceSize=100m -XX:MaxMetaspaceSize=100m -XX:MaxDirectMemorySize=512m -XX:MinMetaspaceFreeRatio=0 -XX:MaxMetaspaceFreeRatio=100 -XX:CompressedClassSpaceSize=20m -XX:InitialCodeCacheSize=50m -XX:ReservedCodeCacheSize=50m -XX:+AlwaysPreTouch -XX:+DisableExplicitGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/tmp -Xloggc:/servers/logs/gc.log.2017-01-05-085234 -Djava.awt.headless=true -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -Dio.netty.leakDetectionLevel=simple -XX:MaxDirectMemorySize=512m -Dadmin.connectors.http.port=9000 -Dproxy.connectors.http.port=8080 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8654 -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -server -cp ...

JVM options:

-server 
-Xms4096m 
-Xmx4096m 
-XX:MetaspaceSize=100m 
-XX:MaxMetaspaceSize=100m 
-XX:MaxDirectMemorySize=512m 
-XX:MinMetaspaceFreeRatio=0 
-XX:MaxMetaspaceFreeRatio=100 
-XX:CompressedClassSpaceSize=20m 
-XX:InitialCodeCacheSize=50m 
-XX:ReservedCodeCacheSize=50m 
-XX:+AlwaysPreTouch 
-XX:+DisableExplicitGC 
-XX:+PrintGCDetails 
-XX:+PrintGCDateStamps 
-XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/var/tmp 
-Xloggc:/servers/logs/gc.log.2017-01-05-085234 
-Djava.awt.headless=true 
-XX:+UnlockCommercialFeatures 
-XX:+FlightRecorder 
-Dio.netty.leakDetectionLevel=simple 
-XX:MaxDirectMemorySize=512m 
-Dadmin.connectors.http.port=9000 
-Dproxy.connectors.http.port=8080 
-Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.port=8654 
-Dcom.sun.management.jmxremote.local.only=false 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false 
-server 
-cp ...

Please find the gc logs here

trincot
  • 317,000
  • 35
  • 244
  • 286
  • 1
    Pleas can you share all parameters which you set during application start? https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/g1_gc_tuning.html#recommendations – Saulius Next Jan 05 '17 at 15:19
  • @SauliusNext I have now added the list of parameters to the question. For G1GC, the only difference is the addition of -XX:+UseG1GC – Sasidhar Sekar Jan 05 '17 at 15:58
  • 1
    since you already have enabled GC logging you should attach a sufficiently long GC log from the period of interest. – the8472 Jan 05 '17 at 17:08
  • 1
    @SasidharSekar i have filed an issue https://bugs.openjdk.java.net/browse/JDK-8172354 on your behalf. Please check the suggestion provided by senior GC engineer at Oracle "Thomas Schatzl", Do let me know if you need further clarification. – Fairoz Jan 09 '17 at 13:43
  • Thank you @Fairoz That clarifies a lot of things. I will try out these suggestions to see how things improve. – Sasidhar Sekar Jan 09 '17 at 14:07

1 Answers1

1

There seem to be many of the following two GC causes:

  • [GC pause (G1 Humongous Allocation) (young) (initial-mark)
  • [GC pause (G1 Evacuation Pause) (young) (to-space exhausted)

The humongous allocations need space in the old gen, the to-space exhaustion drive up young gen size. They're basically competing with each other.

It looks like you're allocating a lot of humongous objects (> 1/2 G1 region size) faster than the concurrent cycles initiated by the IHOP can collect them.

You could try to increase the region size. If they are large primitive arrays (i.e. not reference arrasy) then the experimental eager reclaim feature might help too.

Community
  • 1
  • 1
the8472
  • 40,999
  • 5
  • 70
  • 122
  • Yes. My application does allocate a lot of humongous objects infact. Unfortunately, the eager reclaim feature you are pointing to seems to be available only for OpenJDK (I'm using Hotspot JVM). But, after reading the comments mentioned by you and @sauliusNext, I think I'm going to play around with the following options to see how that helps. -XX:MaxGCPauseMillis -XX:G1MixedGCLiveThresholdPercent=<> -XX:G1MixedGCCountTarget=<> -XX:G1OldCSetRegionThresholdPercent=<> – Sasidhar Sekar Jan 09 '17 at 11:49
  • At the end of the entire exercise, I realized that your suggestion to increase the region size resolved my problem. Thank you :-) – Sasidhar Sekar Jan 10 '17 at 11:51