Setting -XX:+DisableExplicitGC in production: what could go wrong?

Question

we just had a meeting to address some performance issues in a web application that is used to calculate insurance rates. The calculations are implemented in a C/C++-module, that is used in other software packages as well. To make it available as a webservice, a Java wrapper was implemented that exposes an XML based interface and calls the C/C++-module via JNI.

Measurements showed that several seconds were spent on each calculation inside the Java part. So my first recomodation was to enable garbage collection logging in the VM. We could see at once that many stop-the-world full GCs were made. Talking about that, the developper of the java part told us they did a System.gc() on several occasions "to make sure the memory is released after use".

OK, I won't elaborate on that statement any further... ;-)

We then added abovementioned -XX:+DisableExplicitGC too the VMs arguments and reran the tests. This gained about 5 seconds per calculation.

Since we cannot change the code by stripping all those System.gc() calls at this point in our release process, we are thinking about adding -XX:+DisableExplicitGC in production until a new Jar can be created.

Now the question is: could there be any risk in doing so? About the only thing I can think of is tomcat using System.gc() internally when redeploying, but that's just a guess. Are there any other hazards ahead?

Since `System.gc` has no guarantee anyway, you will technically not break anything, but what may happen is that you trigger a bug in some code that is already broken. There's little consolation in that, though. — Marko Topolnik, Oct 11 '12 at 19:37
@MarkoTopolnik: According to the documentation, it is not guaranteed to do anything. But in practice (our version of the JVM, our set of vmargs etc.) it does. Have a look at the comments and answers to http://stackoverflow.com/questions/6941802/system-gc-calls-by-core-apis (just found this via google). — Axel, Oct 11 '12 at 19:42
Of course it **does** something, that's why it exists :) However, if it didn't do anything, it would still respect its contract. Therefore any code that malfunctions under `-XX:+DisableExplicitGC` is broken. — Marko Topolnik, Oct 11 '12 at 19:44
More proof that one should almost never try to second-guess the GC and invoke `System.gc()` yourself :) — matt b, Oct 11 '12 at 19:46
@MarkoTopolnik: I was referring to the comment "Interestingly, if direct buffers are used, DisableExplicitGC flag becomes quite dangerous" to that question. But I think we will just do a test cycle and give it a go. — Axel, Oct 11 '12 at 19:47
You are correct that there is at least one place where Tomcat tries to call `System.gc()` when a context is reloaded: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.tomcat/tomcat-catalina/7.0.29/org/apache/catalina/core/StandardHost.java#StandardHost.findReloadedContextMemoryLeaks%28%29 — matt b, Oct 11 '12 at 19:49
Axel. keep in mind that any code that is dangerous **with** the flag is also dangerous **without** the flag. For example, if your production environment at any point starts using the ConcurrentMarkSweep GC, the call to `System.gc` will only start a concurrent GC run, which will in all probability still be in progress when control from `gc` returns to the caller. — Marko Topolnik, Oct 11 '12 at 19:50
@MarkoTopolnik actually it'll start a full GC, so I'm not sure what that entails for the CMS already running. — Frank Pavageau, Oct 11 '12 at 19:54
@FrankPavageau Not with `-XX:+ExplicitGCInvokesConcurrent` or `XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses`. As you point out in your answer, this should in fact be preferred to disabling the explicit GC altogether, but, as far as code relying on `System.gc` returning control upon GC completion is concerned, all these flags are on an equal footing. — Marko Topolnik, Oct 11 '12 at 20:00
@Axel: For what it's worth, it's probably pretty unlikely you're using NIO direct buffers without knowing this. They seem to be an IO optimisation for dealing with large, long-lived byte buffers, which doesn't sound like something that would come up in a line of business webapp. — millimoose, Oct 11 '12 at 20:03
@millimoose Most modern cache implementations use direct buffers for off-heap storage. But I guess a cache implementation wouldn't be intrusive to the regular operation of the heap. — Marko Topolnik, Oct 11 '12 at 20:04

score 44 · Accepted Answer · answered Oct 11 '12 at 19:42

You are not alone in fixing stop-the-world GC events by setting the -XX:+DisableExplicitGC flag. Unfortunately (and in spite of the disclaimers in the documentation), many developers decide they know better than the JVM when to collect memory and introduce exactly this type of issue.

I'm aware of many instances where the -XX:+DisableExplicitGC improved the production environment and zero instances where there were any negative side effects.

The safe thing to do is to run your current production code, under load, with that flag set in a stress test environment and perform a normal QA cycle.

If you cannot do that, I would suggest that the risk of setting the flag is less than the cost of not setting it in most cases.

On this note: "many developers decide they know better than the JVM". Actually this a problem with Java documentation. Description of `System.gc` says nothing about it being a dangerous command. I've seen way too many tutorial suggesting to use `System.gc` *routinely* to release memory. https://docs.oracle.com/javase/7/docs/api/java/lang/System.html#gc() — Nux, Jan 31 '20 at 14:11

score 2 · Answer 2 · edited May 23 '17 at 12:24

I've been wrestling with this same issue, and based on all the information I've been able to find there definitely appears to be some risk. Per the comments on your original post from @millimoose, as well as https://bugs.openjdk.java.net/browse/JDK-6200079 , it appears that setting -XX:+DisableExplicitGC would be a bad idea if the NIO direct buffers are being used. It appears that they are being used in the internal implementation of the Websphere 8.5 app server which we're using. Here's the stack trace I was able to capture while debugging this:

3XMTHREADINFO      "WebContainer : 25" J9VMThread:0x0000000006FC5D00, j9thread_t:0x00007F60E41753E0, java/lang/Thread:0x000000060B735590, state:R, prio=5
3XMJAVALTHREAD            (java/lang/Thread getId:0xFE, isDaemon:true)
3XMTHREADINFO1            (native thread ID:0x1039, native priority:0x5, native policy:UNKNOWN)
3XMTHREADINFO2            (native stack address range from:0x00007F6067621000, to:0x00007F6067662000, size:0x41000)
3XMCPUTIME               CPU usage total: 80.222215853 secs
3XMHEAPALLOC             Heap bytes allocated since last GC cycle=1594568 (0x1854C8)
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at java/lang/System.gc(System.java:329)
4XESTACKTRACE                at java/nio/Bits.syncReserveMemory(Bits.java:721)
5XESTACKTRACE                   (entered lock: java/nio/Bits@0x000000060000B690, entry count: 1)
4XESTACKTRACE                at java/nio/Bits.reserveMemory(Bits.java:766(Compiled Code))
4XESTACKTRACE                at java/nio/DirectByteBuffer.<init>(DirectByteBuffer.java:123(Compiled Code))
4XESTACKTRACE                at java/nio/ByteBuffer.allocateDirect(ByteBuffer.java:306(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/buffermgmt/impl/WsByteBufferPoolManagerImpl.allocateBufferDirect(WsByteBufferPoolManagerImpl.java:706(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/buffermgmt/impl/WsByteBufferPoolManagerImpl.allocateCommon(WsByteBufferPoolManagerImpl.java:612(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/buffermgmt/impl/WsByteBufferPoolManagerImpl.allocateDirect(WsByteBufferPoolManagerImpl.java:527(Compiled Code))
4XESTACKTRACE                at com/ibm/io/async/ResultHandler.runEventProcessingLoop(ResultHandler.java:507(Compiled Code))
4XESTACKTRACE                at com/ibm/io/async/ResultHandler$2.run(ResultHandler.java:905(Compiled Code))
4XESTACKTRACE                at com/ibm/ws/util/ThreadPool$Worker.run(ThreadPool.java:1864(Compiled Code))
3XMTHREADINFO3           Native callstack:
4XENATIVESTACK               (0x00007F61083DD122 [libj9prt26.so+0x13122])
4XENATIVESTACK               (0x00007F61083EA79F [libj9prt26.so+0x2079f])
....

Just what exactly the full ramifications are of setting -XX:+DisableExplicitGC when NIO direct byte buffers are being used isn't entirely clear to me yet (does this introduce a memory leak?), but there at least does appear to be some risk there. If you're using an app server other than Websphere you may want to verify that the app server itself isn't invoking System.gc() via NIO before disabling it. I've got a related question that will hopefully get some clarification on the exact impact on the NIO libraries here: Impact of setting -XX:+DisableExplicitGC when NIO direct buffers are used

Incidentally, Websphere also seems to manually invoke System.gc() several times during the boot process, usually twice within the first couple seconds after the app server is launched, and a third time within the first 1-2 minutes (possibly when the application is being deployed). In our case, this is why we started investigating in the first place, as it appears that all the System.gc() calls are coming directly from the app server, and never from our application code.

It should also be noted that in addition to the NIO libraries, the JDK internal implementation of RMI distributed garbage collection also calls System.gc(): Unexplained System.gc() calls due to Remote Method Invocation System.gc() calls by core APIs

Whether enabling -XX:+DisableExplicitGC will also wreak havoc with RMI DGC is also a little unclear to me. The only reference I've been able to find that even addresses this is the first reference above, which states

"However, in most cases regular GC activity is sufficient for effective DGC"

That 'in most cases' qualifier sounds awfully wishy-washy to me, so again, it seems like there's at least some risk is just shutting off all System.gc() calls, and you'd be better off fixing the calls in your code if at all possible and only shutting them off entirely as a last resort.

If you use -XX:+DisableExplicitGC together with RMI client, you can cause OOM on your RMI server due to this bug: https://bugs.openjdk.java.net/browse/JDK-6791811 — Anton Koscejev, Oct 29 '18 at 15:40

score 1 · Answer 3 · edited Oct 11 '12 at 20:07

1

If you use -XX:+DisableExplicitGC and use CMS, you might want to use -XX:+CMSClassUnloadingEnabled as well to limit another reason for full GCs (i.e. the PermGen being full). Other than that, I haven't had problems using the option, though I've switched to using -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses, because my only cause of explicit GCs was RMI, not application code.

edited Oct 11 '12 at 20:07

Alfabravo

7,493
6
46
82

answered Oct 11 '12 at 19:50

Frank Pavageau

11,477
1
43
53

Setting -XX:+DisableExplicitGC in production: what could go wrong?

3 Answers3

Linked