Tweak G1 Java garbage collector for >100GB heaps, but 90% static

Question

When reading about tuning options for GC G1 in Java I stumble over the fact that humongous are bad and should be avoided. In my application I have a huge portion of my RAM (110GB of a RAM with Xmx=Xms=130g) living the whole application uptime, so I cannot avoid this. Now I sometimes see timeouts and in the GC logs I see 'region allocation request failed':

2017-09-24T17:12:59.572+0200: 188333.617: [GC pause (G1 Evacuation Pause) (young) 188333.617: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 9857, predicted base time: 8.63 ms, remaining time: 191.37 ms, target pause time: 200.00 ms]
 188333.617: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 182 regions, survivors: 26 regions, predicted young region time: 377.38 ms]
 188333.617: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 182 regions, survivors: 26 regions, old: 0 regions, predicted pause time: 386.01 ms, target pause time: 200.00 ms]
 188333.969: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: region allocation request failed, allocation request: 643456 bytes]
 188333.969: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 643456 bytes, attempted expansion amount: 33554432 bytes]
 188333.969: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap already fully expanded]
 188333.993: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 133311758336 bytes, allocation request: 0 bytes, threshold: 62813896695 bytes (45.00 %), source: end of GC]
 (to-space exhausted), 0.3754796 secs]
   [Parallel Time: 356.3 ms, GC Workers: 10]
      [GC Worker Start (ms): Min: 188333617.4, Avg: 188333617.6, Max: 188333617.6, Diff: 0.2]
      [Ext Root Scanning (ms): Min: 0.2, Avg: 0.3, Max: 0.5, Diff: 0.2, Sum: 3.3]
      [Update RS (ms): Min: 1.2, Avg: 1.2, Max: 1.3, Diff: 0.1, Sum: 12.5]
         [Processed Buffers: Min: 2, Avg: 4.8, Max: 9, Diff: 7, Sum: 48]
      [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.3]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Object Copy (ms): Min: 354.4, Avg: 354.5, Max: 354.5, Diff: 0.1, Sum: 3544.6]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.3]
         [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 10]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [GC Worker Total (ms): Min: 356.0, Avg: 356.1, Max: 356.2, Diff: 0.2, Sum: 3561.1]
      [GC Worker End (ms): Min: 188333973.7, Avg: 188333973.7, Max: 188333973.7, Diff: 0.0]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.2 ms]
   [Other: 18.9 ms]
      [Evacuation Failure: 18.2 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 0.1 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.1 ms]
      [Humongous Register: 0.3 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.1 ms]
   [Eden: 5824.0M(5824.0M)->0.0B(5824.0M) Survivors: 832.0M->832.0M Heap: 124.1G(130.0G)->125.0G(130.0G)]
 [Times: user=3.58 sys=0.00, real=0.38 secs] 
 188334.826: [G1Ergonomics (Concurrent Cycles) initiate concurrent cycle, reason: concurrent cycle initiation requested]

What does the line regarding failed allocation request mean? Could this be a garbage collection issue? And if yes, what could I try to improve this situation? E.g. what could be some GC configs so that unavoidable humongous objects are properly handled.

My JVM settings are:

-XX:+UseG1GC -Xloggc:$GCLOGS -XX:+PrintGCDateStamps -XX:MetaspaceSize=100M -XX:MaxGCPauseMillis=200 -XX:+PrintAdaptiveSizePolicy -XX:+PrintGCDetails

BTW: the situation is far better now after disabling swapping compared to our previous situation.

It's not clear what problem you're trying to solve or if in fact there is any problem at all. — the8472, Sep 25 '17 at 08:55
Yeah, when rereading this you are right. Will try to improve the question. — Karussell, Sep 25 '17 at 15:38
The GC log you posted does not show any excessive GC behavior (although to-space exhaustion may indicate some suboptimal behavior). You say something about timeouts and humongous allocations but there's no data on that. So really, there's insufficient information to help. — the8472, Sep 25 '17 at 16:29
I'm sorry for this. I thought that the problem might be 'reason: region allocation request failed'. Doesn't this indicate there is a problem with the GC and could causing the rare timeouts I'm seeing? — Karussell, Sep 27 '17 at 08:54
a) you are assuming *without verifying* that GC is related to your timeouts b) IF gc is related to timeouts it would be due to long GC pauses or OOMEs, which are not evident from your log — the8472, Sep 27 '17 at 11:57
Ok. I'm sorry for my lack of knowledge here. So the line regarding the allocation request failed is not a problematic one and just for my information? Or what can I read from this? — Karussell, Sep 27 '17 at 13:57
It is informational. It tells you that the heuristics think it would be good to expand the heap but they can't because the maximum heap size is already reached. It might hint that it could perform better, but it's not something that indicates timeouts. Anyway, asking for explanations about random things in your GC logs is not going to help you solve your problem anytime soon because there is a lot of information in there. First of all you need to establish whether your problem is related to GC at all and *after that* one could look for anomalies, but that would require a full GC log. — the8472, Sep 27 '17 at 14:25
Thanks a lot for this information! I tried to solve the problem differently (and as it now looks I've overlooked something) but I didn't find anything and so looked into GC problems where I though I found something. Now this is highly likely a new feature we introduced and where I didn't find anything ugly before. So thanks again! — Karussell, Sep 28 '17 at 06:04

Tweak G1 Java garbage collector for >100GB heaps, but 90% static

0 Answers0