Java parallel GC: unnecessary Full GC

Question

I have a service that reads data from a source, performs some conversion to the data and then uploads the converted to a destination. When picking GC algorithm, I am looking for the one with high throughput and that was why I picked parallel GC. The part that confused me a lot is why I saw a good amount of Full GC. The nature of the service makes most of the objects short living as data comes and goes. Here is my GC config:

-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -verbose:GC -XX:+UseParallelGC -XX:NewSize=21200m -XX:MaxNewSize=21200m -server -Xms31200m -Xmx31200m

Basically, I set the total heap size to 30GB and the young gen size to 20GB.

Here is a piece of the GC log:

2020-11-08T08:31:07.863+0000: 215.876: [Full GC (Ergonomics) [PSYoungGen: 1233347K->0K(18729472K)] [ParOldGen: 9065862K->6633660K(10240000K)] 10299209K->6633660K(28969472K), [Metaspace: 107588K->107588K(1144832K)], 1.1350824 secs] [Times: user=21.03 sys=0.00, real=1.14 secs]
2020-11-08T08:31:10.627+0000: 218.640: [GC (GCLocker Initiated GC)
Desired survivor size 2699034624 bytes, new threshold 1 (max 15)
[PSYoungGen: 15874560K->1274938K(19073024K)] 22513996K->7914375K(29313024K), 0.1073842 secs] [Times: user=3.10 sys=0.00, real=0.11 secs]
2020-11-08T08:31:12.319+0000: 220.331: [GC (GCLocker Initiated GC)
Desired survivor size 2587885568 bytes, new threshold 1 (max 15)
[PSYoungGen: 17602106K->1307000K(18962944K)] 24253865K->8618788K(29202944K), 0.2492961 secs] [Times: user=7.16 sys=0.00, real=0.25 secs]
2020-11-08T08:31:14.197+0000: 222.210: [GC (GCLocker Initiated GC)
Desired survivor size 2480930816 bytes, new threshold 1 (max 15)
[PSYoungGen: 17634168K->1333816K(19286016K)] 24952891K->9297010K(29526016K), 0.2524904 secs] [Times: user=7.07 sys=0.00, real=0.25 secs]
2020-11-08T08:31:16.165+0000: 224.178: [GC (GCLocker Initiated GC)
Desired survivor size 2386558976 bytes, new threshold 1 (max 15)
[PSYoungGen: 18092600K->1313137K(19181568K)] 26062932K->9992006K(29421568K), 0.2845171 secs] [Times: user=7.85 sys=0.00, real=0.29 secs]
2020-11-08T08:31:18.084+0000: 226.096: [GC (GCLocker Initiated GC)
Desired survivor size 2312110080 bytes, new threshold 1 (max 15)
[PSYoungGen: 18071921K->1242981K(19450880K)] 26751020K->10584632K(29690880K), 0.2523254 secs] [Times: user=6.79 sys=0.00, real=0.26 secs]
2020-11-08T08:31:18.336+0000: 226.349: [Full GC (Ergonomics) [PSYoungGen: 1242981K->0K(19450880K)] [ParOldGen: 9341651K->6896991K(10240000K)] 10584632K->6896991K(29690880K), [Metaspace: 107625K->107625K(1144832K)], 1.0198299 secs] [Times: user=18.34 sys=0.08, real=1.02 secs]
2020-11-08T08:31:21.049+0000: 229.062: [GC (GCLocker Initiated GC)
Desired survivor size 2221408256 bytes, new threshold 1 (max 15)
[PSYoungGen: 17120256K->1356565K(19378176K)] 24043241K->8279559K(29618176K), 0.1089915 secs] [Times: user=3.38 sys=0.00, real=0.11 secs]
2020-11-08T08:31:22.887+0000: 230.899: [GC (GCLocker Initiated GC)
Desired survivor size 2155872256 bytes, new threshold 1 (max 15)
[PSYoungGen: 18476821K->1265473K(19603456K)] 25426058K->8896652K(29843456K), 0.2524566 secs] [Times: user=7.14 sys=0.00, real=0.25 secs]
2020-11-08T08:31:24.888+0000: 232.901: [GC (GCLocker Initiated GC)
Desired survivor size 2092433408 bytes, new threshold 1 (max 15)
[PSYoungGen: 18699585K->1388375K(19539456K)] 26345045K->9562491K(29779456K), 0.2113546 secs] [Times: user=5.59 sys=0.00, real=0.21 secs]
2020-11-08T08:31:26.819+0000: 234.832: [GC (GCLocker Initiated GC)
Desired survivor size 2030043136 bytes, new threshold 1 (max 15)
[PSYoungGen: 18822487K->1308016K(19726336K)] 27003840K->10002863K(29966336K), 0.2078162 secs] [Times: user=6.10 sys=0.00, real=0.21 secs]
2020-11-08T08:31:28.868+0000: 236.881: [GC (GCLocker Initiated GC)
Desired survivor size 2030043136 bytes, new threshold 1 (max 15)
[PSYoungGen: 18990960K->1521040K(19665408K)] 27712283K->10780549K(29905408K), 0.2373748 secs] [Times: user=6.60 sys=0.00, real=0.23 secs]
2020-11-08T08:31:29.106+0000: 237.119: [Full GC (Ergonomics) [PSYoungGen: 1521040K->0K(19665408K)] [ParOldGen: 9259509K->7378423K(10240000K)] 10780549K->7378423K(29905408K), [Metaspace: 107653K->107653K(1144832K)], 1.0809680 secs] [Times: user=20.55 sys=0.00, real=1.09 secs]

There are a couple of things from the log that really confused me:

How does JVM figure out the Desired survivor size? Why it was around 2.5 GB? Why it changed a little bit in each soft GC? Why did total old gen size never changed (10240000K) but the total young gen size changed all the time?
Why the *new threshold was always 1? Wasn't this just way too aggressive to move things into old gen?
After each soft GC, the young gen most likely had around 1.3GB of data and some amount of the data were moved to old gen. That caused old gen to gradually get full and Full GC eventually happened to cleanup old gen. Why a portion of data were moved to old gen in each soft GC? Looks like the survivor space is big enough.
What I can do to avoid unnecessary Full GC so that the overall throughput can be improved?

I'll just address point 4 as the answer below deals with the first 3. You have selected a GC that will not perform old gen (or full) gc until it's really needed. Full gc is the most expensive and the cpu cycles are dedicated to your work instead. Using concurrent gc:s will remove some or all of the full gc:s but you will lose cpu cycles instead. So, there are no guarantees that a concurrent gc will actually be faster. — Erik, Nov 10 '20 at 07:58
Couple of things: 1. I am seeing even with Full GC, JVM was not able to clear much space from Old generation. This indicates long referenced objects. Looks like your app is keeping ~6.5GB objects "live". Then I noticed you left around ~10GB for Old generation. I think it's a very tight situation. I am not sure why you set a huge Young generation which is twice of Old generation size. That could be the reason you are seeing frequent Full GCs. 2. As per this blog[https://blogs.oracle.com/poonam/throughput-collector-and-printtenuringdistribution-option], Parallel GC do not use age table. — suv3ndu, Nov 10 '20 at 14:59
contd...So I think object age does not matter much in this case. As you are exploring different GC algorithms, I will suggest to start with basics. First, check "out of the box" how different GC policies performing. After that you can start tuning the faster one from the lot. In this case, I'll suggest to remove those NewSize params. You can also add -XX:+UseParallelOldGC to make Old generation collection faster by using parallel threads. — suv3ndu, Nov 10 '20 at 15:00
Also use -XX:+PrintAdaptiveSizePolicy to get the information about amount of data survived in Young generation and the amount of data promoted to Old generation for each Young GC. It will help you to investigate further. — suv3ndu, Nov 10 '20 at 15:01
@suv3ndu "Looks like your app is keeping ~6.5GB objects "live" this is a great point and that is definitely unexpected. The application reads data from one source, apply some conversion to the data and then upload them to a destination. The memory footprint shouldn't be like this. I need to figure out why some objects live longer than expected. — Z.SP, Nov 13 '20 at 21:27

score 2 · Accepted Answer · answered Nov 10 '20 at 08:17

2

I'll just address point 4 as the answer from Sachith deals with the first 3. You have selected a GC that will not perform old gen (or full) gc until it's really needed. Full gc is the most expensive and the cpu cycles are dedicated to your work instead. Using concurrent gc:s will remove some or all of the full gc:s but you will lose cpu cycles instead. So, there are no guarantees that a concurrent gc will actually be faster. Also, from your flags -Xms31200m -Xmx31200m. You are setting the min and max sizes of the heap to the same and that means that the VM will not perform any ergonomics (adaptions) on the heap. Depending on how important the performance of your app is and you have a decent test env, I would suggest testing different gc:s and see what kind of performance you get. I would also use the factory setting for everything except max heap and see how far that gets you.

answered Nov 10 '20 at 08:17

Erik

2,013
11
17

1) so you think if you specify those to around `31GB`, but use only `1GB` and you are hitting the target towards `200ms` default pauses, the JVM will not release memory back ([even with G1GC before java-12](https://stackoverflow.com/questions/59362760/does-g1gc-release-back-memory-to-the-os-even-if-xms-xmx/59377080#59377080)) and not resize regions? You are in for a surprise. 2) and letting an app without max heap specified is just a terrible idea - ever heard of containers? – Eugene Nov 11 '20 at 01:54
If you read what I actually wrote, I specifically said that max heap should be set. I also said nothing about releasing memory back to the OS. I assumed that the OP actually needed 31 GB and was displeased that he got full gc:s. In fact, re-reading your comment I don't think you're addressing my reply at all. – Erik Nov 11 '20 at 08:00
_...will not perform any ergonomics (adaptions) on the heap_, isn't this exactly what I tried to say with : _memory back ... and not resize regions.._? the max heap part, I admit, I missread that and apologize for. – Eugene Nov 11 '20 at 19:15
When using the parallell collector, the collector will attempt to resize old-gen and young gen regions but only within the margins given by MS and MX, in other words, being adaptive. This is generally regarded as being a good thing. The price you pay is startup time and that it takes longer to reach some sort of steady state. I don't know if it includes returning heap to the os. If you set MX==MS then the adaptive state is turned off. – Erik Nov 12 '20 at 08:00
Erik, you are correct and I don't know how I missed that in _both_ of my comments. The fact that you set `Xmx` and `Xms` to the same value, does not necessarily mean that adaptive is off. It does in fact mean that for ParallelGC and G1GC _at the moment_, but there is nothing stopping any other GC or future java version to narrow the heap (and thus resize regions) _below_ `Xms`. Though, I will just make that clear, this is the case at the moment. – Eugene Nov 12 '20 at 16:06

Sachith Dickwella · Answer 2 · 2020-11-11T02:45:06.260

Well, answering your number of questions cannot be done with simple explanations;

JVM uses -XX:SurvivorRatio parameter to define survivor generation size. Default value for this is -XX:SurvivorRatio=8. This is a ratio and this mean survivor space is one-eighth of the size of the Eden space. For your case this gives your survivor space size - 1/8 * 20GB. According to this document this often not important for performance. Since you're setting fixed large size for the young generation, old generation stay the same. Using -XX:+UseAdaptiveSizePolicy for ParallelGC may help to adapt the size around young/old boundary. Also, larger the young generation, less often GC minor collections happens. Seems these minor collections would be the case you see survival space slightly shrink and grow.
threshold has chosen by the JVM for ParallelGC. As per this article,

If survivor spaces are too small, copying collection overflows directly into the tenured generation. If survivor spaces are too large, they will be uselessly empty. At each garbage collection, the virtual machine chooses a threshold number, which is the number of times an object can be copied before it is tenured. This threshold is chosen to keep the survivors half full.

This seems like an aggressive behavior. But the minor collection cycles are significantly apart and it seems, if required threshold is allowed to change up to 15 as well.

If some objects lives desired amount of garbage collection cycles in the young generation, as ParallelGC design, they are destined to move to old generation. You cannot guarantee, how large the young generation is, long survived objects stays in the young generation forever. Young generation is for rapidly allocating and deallocating objects, not for long living objects. So, as you observed, eventually old generation gets filled and clean up.
Assuming you're using Java 8 or later version, to improve the throughput of the program, I would say, use the G1GC instead of ParallelGC. Since your heap is significantly large, G1GC would be the ideal choice. G1GC algorithm designed to execute on very large terra byte (TB) heap spaces with minimal pause time. G1GC recommend to use on heaps larger than 6GB (Garbage First Garbage Collector Tuning). When using the G1GC, -XX:+UseStringDeduplication would be a great help if your program work with large String objects. This GC partition the entire heap space into multiple small regions and execute the collection process using parallel and concurrent threads.

There are two other experimental GCs also there (ZGC and Shenandoah) which released with Java 11 and Java 12 respectively. These GCs significantly reduce the pause time with more garbage collection.

Update: ZGC and Shenandoah stable releases are available with Java 15 released on September 2020.

1) _no pause time at all_ wrong, they do pause, always, even if for a very short period of time. 2) they are not experimental anymore 3) `UseAdaptiveSizePolicy` will cancel `SurvivorRatio` out (and that is on by default) 4) if you make young generation large - you minor collections will get _larger_ too, even if spread apart in time... there are more points in your answer where you confuse things, badly. — Eugene, Nov 11 '20 at 02:08
I agree with you about the first point, ZGC and Shenandoah do have very little pause times. I thought concurrent threads make those GCs no pause GCs, but that's wrong. About the second point, they were experimental (at least did not support on specific platforms) up until Java 15. So, yes, now they are not experimental. And I'm not saying `UseAdaptiveSizePolicy` and `SurvivorRatio` both flags should use together, obviously they do somewhat opposite operations. If there're more confusing point, I'm glad if you could point them out, it would help both me and the OP. Thanks! — Sachith Dickwella, Nov 11 '20 at 02:57

Java parallel GC: unnecessary Full GC

2 Answers2