When reading about tuning options for GC G1 in Java I stumble over the fact that humongous are bad and should be avoided. In my application I have a huge portion of my RAM (110GB of a RAM with Xmx=Xms=130g) living the whole application uptime, so I cannot avoid this. Now I sometimes see timeouts and in the GC logs I see 'region allocation request failed':
2017-09-24T17:12:59.572+0200: 188333.617: [GC pause (G1 Evacuation Pause) (young) 188333.617: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 9857, predicted base time: 8.63 ms, remaining time: 191.37 ms, target pause time: 200.00 ms]
188333.617: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 182 regions, survivors: 26 regions, predicted young region time: 377.38 ms]
188333.617: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 182 regions, survivors: 26 regions, old: 0 regions, predicted pause time: 386.01 ms, target pause time: 200.00 ms]
188333.969: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: region allocation request failed, allocation request: 643456 bytes]
188333.969: [G1Ergonomics (Heap Sizing) expand the heap, requested expansion amount: 643456 bytes, attempted expansion amount: 33554432 bytes]
188333.969: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: heap already fully expanded]
188333.993: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 133311758336 bytes, allocation request: 0 bytes, threshold: 62813896695 bytes (45.00 %), source: end of GC]
(to-space exhausted), 0.3754796 secs]
[Parallel Time: 356.3 ms, GC Workers: 10]
[GC Worker Start (ms): Min: 188333617.4, Avg: 188333617.6, Max: 188333617.6, Diff: 0.2]
[Ext Root Scanning (ms): Min: 0.2, Avg: 0.3, Max: 0.5, Diff: 0.2, Sum: 3.3]
[Update RS (ms): Min: 1.2, Avg: 1.2, Max: 1.3, Diff: 0.1, Sum: 12.5]
[Processed Buffers: Min: 2, Avg: 4.8, Max: 9, Diff: 7, Sum: 48]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.3]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 354.4, Avg: 354.5, Max: 354.5, Diff: 0.1, Sum: 3544.6]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.3]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 10]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[GC Worker Total (ms): Min: 356.0, Avg: 356.1, Max: 356.2, Diff: 0.2, Sum: 3561.1]
[GC Worker End (ms): Min: 188333973.7, Avg: 188333973.7, Max: 188333973.7, Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.2 ms]
[Other: 18.9 ms]
[Evacuation Failure: 18.2 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.1 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.3 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.1 ms]
[Eden: 5824.0M(5824.0M)->0.0B(5824.0M) Survivors: 832.0M->832.0M Heap: 124.1G(130.0G)->125.0G(130.0G)]
[Times: user=3.58 sys=0.00, real=0.38 secs]
188334.826: [G1Ergonomics (Concurrent Cycles) initiate concurrent cycle, reason: concurrent cycle initiation requested]
What does the line regarding failed allocation request mean? Could this be a garbage collection issue? And if yes, what could I try to improve this situation? E.g. what could be some GC configs so that unavoidable humongous objects are properly handled.
My JVM settings are:
-XX:+UseG1GC -Xloggc:$GCLOGS -XX:+PrintGCDateStamps -XX:MetaspaceSize=100M -XX:MaxGCPauseMillis=200 -XX:+PrintAdaptiveSizePolicy -XX:+PrintGCDetails
BTW: the situation is far better now after disabling swapping compared to our previous situation.