2

We have a java application that reads a chunk of data but keeps that data only for a short period of time. The data is stored in "simple" collections (HashMap, HashSet). These collections are cleared when the data is processed (so I call coll.clear() and not coll=null). The cycle (read-process-clear) continues, until "all chunks of data" are processed. After a certain amount of time, there will be "new chunks" and the whole thing starts again.

This process has run for several weeks on a server without any problem. However, today, after a planned restart, it crashed over and over and again with an OutOfMemoryError: Java heap space (and restarted automatically by a monitoring process).

I connected to the process with a remote debugger AND with the jvisualvm tool to try and find if (and where) I could have a memory leak. While the processing-thread was paused (by the debugger) right after the calls to clear(), I forced a gc with the jvisualvm tool. As I had expected, it cleared almost the entire heap (only 4MB used). Next cycles: same behavour, almost no usage of the heap after clear, etc... In the end, the process did not go out of memory anymore!

To me, it looks like the garbage collector failed to work correctly...

  • how can I verify if that's the case?
  • if so, how can this be?
  • should I call System.gc() after the clear() methods?

    But as far as I know (and read here), that would only be a "suggestion" to the VM; ánd the GC will always collect all possible garbage when the heap is almost full; ánd such a call should simply be avoided :-)...

(we're running Java 1.6.0_51-b11 in server mode on Solaris, no special GC-options)

EDIT after analyzing heap dumps:

Our code has this structure:

final DataCollector collector = ...
while (!collector.isDone()) {
    final List<Data> dataList = collector.collectNext();
    for (final Data data : dataList) {
        // process data...
    }
}

The OOMError occurs while executing the collector.collectNext() method.

It looks like the heap still contains the dataList variable (and all Data objects) of the previous iteration of the while loop!

Is it normal behavour that a local variable of a while loop does not get garbage-collected? If that's true, we have to give this process almost twice as much memory as strictly needed...

As a hack/check, I added a line dataList = null after the for-loop, but this does not change the behaviour (still OOM, heap dump still shows the same 'double assignment').

(I guess we were lucky that the process did not crash earlier.)

Community
  • 1
  • 1
geronimo
  • 839
  • 1
  • 9
  • 19
  • 1
    Post the entire stack trace for the `OutOfMemoryError`; it can have additional messages that are helpful. – chrylis -cautiouslyoptimistic- Apr 21 '15 at 19:32
  • "The entire stacktrace" is impossible: the process had crashed +40 times, each time with a complete different stack trace (except for the 5 calls after `main()`). I.m.o. it is also useless: the exact code where the `OOMError` occurs can be practically anything, here I see it occuring in `java.nio.ByteBuffer.wrap`, `com.sybase.jdbc4.utils.BufferPool.makeBuffer`, `java.util.GregorianCalendar.computeFields`, `java.util.jar.Manifest$FastInputStream.`, etc... – geronimo Apr 21 '15 at 19:44
  • @chrylis: `Java heap space`, if that's what you mean. Updated original question. – geronimo Apr 21 '15 at 19:55
  • Are you sure nothing has changed? Some Java update or a new version of the application? It's inexplicable, so everything must be considered. As a hack I'd try to give the process more memory and I'd *surely* try `System.gc()` (just remove it later when the problem gets solved). – maaartinus Apr 21 '15 at 20:16
  • @maaartinus: yes, i'm sure :-). – geronimo Apr 22 '15 at 09:24
  • 1
    Sure, the GC can be broken. However, it is way more probable that your own code is broken. Add `-XX:+HeapDumpOnOutOfMemoryError` and analyze the heap dump with Ecilpse MAT. – K Erlandsson Apr 22 '15 at 10:31
  • @KristofferE: analyzing the heap dump has puzzled me more, see my edited question... – geronimo Apr 23 '15 at 10:36
  • The next step I think would be to turn on all the verbose garbage collecting info. To me it seems like something is going wrong with the garbage collector since it works if you manually do it, and it's just a matter of figuring that out. – Necreaux Apr 23 '15 at 12:57
  • I just noticed you are on b11? Upgrade that JVM at least to the latest version of Java 6 if not to Java 8. Earlier builds of Java 6 are VERY buggy. – Necreaux Apr 23 '15 at 13:03
  • We'e on java6 v.51 (not 11). But the problem persists in java7 and 8. After analyzing more heap dumps, the objects that we believe should be collected, are all immediate members of the "native stack". Apparently there still are stack pointers to the heap that are not yet reset to `null` but that we know will never be used again. Also, I cannot reproduce my "fix" of manual-force-gc as initially described. I guess I was lucky, and the only option is to increase the heap size -- which we have done. – geronimo May 06 '15 at 07:19

2 Answers2

0

I have run into issues where the garbage collector bails after a certain amount of time and false OOMEs occur. This tends to happen when you have a complex chain of objects with lots of circles. The solution is to use the -XX:-UseGCOverheadLimit flag to tell the garbage collector not to give up.

Necreaux
  • 9,451
  • 7
  • 26
  • 43
  • 1
    I disagree twice: 1. GC has no problem with complex chains, especially if unreachable (GC does not really collect garbage, it picks up the non-garbage instead and what's left is free memory). 2. It's not really false OOME, it's a situation when collecting memory takes long and the gain is small. This means that the application will hardly get a chance to run. – maaartinus Apr 21 '15 at 20:10
  • @maaartinus This has been the recommended workaround by Oracle/Sun for various bugs. In particular there was a nasty one with SoftReference objects resulting in things never being garbage collected, but there has been various others. Many of these have been fixed/adjusted in newer JVMs but the OP is clearly using an older JVM in which this may not have been fixed. Once the GC gets past the "complexities" the first time it runs smoothly. It doesn't cause the application to freeze up as you suggest. – Necreaux Apr 21 '15 at 20:27
  • 1
    I know about the `SoftReference` bug, but not the others. Every time I saw "GC overhead too high", things got only worse when suppressed. But my experience is limited, so you may be right. However, the OP's getting "Java heap space", which seems to be a different problem. – maaartinus Apr 21 '15 at 20:50
  • @Necreaux: for what I can find about that option, the option would only have an effect if the error msg was `GC overhead limit exceeded`. In my case it's a `java heap space` error, so that wouldn't help... or is this not correct? Also, could you point to the these oracle/sun recommendations, and/or the bugs? – geronimo Apr 22 '15 at 09:22
  • @Geronimo Yes, the `GC overhead limit exceeded` is it, but I've seen it buried inside the stack trace or chained inside other exceptions. Depending on how your exceptions are being printed, you might not be seeing it. My thought has always been that this option can't hurt. AFAIK worst case scenario, if there is a GENUINE memory leak, this just makes the JVM thrash around a little while before the OOME rather than throwing it immediately. It's also something to try when you don't have any other good ideas. – Necreaux Apr 22 '15 at 12:01
  • Adding this flag does **not** help :-( – geronimo Apr 23 '15 at 12:41
0

I had same issue and what I did was :

Clean your project by going to : Project > Clean > Clean all projects (tick this option)

And if problem persists you need to refactor your code, and get rid of memory leak.

surendrapanday
  • 530
  • 3
  • 13