5

We are facing a strange memory leak issue in our application.

GC configuration: ParNew + CMS

A certain type of Objects are getting promoted to old generation too early and causing severe fragmentation issues.

  1. Survivor had enough space to accommodate these objects
  2. Ageing threshold is 15 cycles and no premature promotion happened based on that.

About those objects: These are proxy objects created using Java assist library.

Due to the unnecessary promotion of such objects, Old generation is getting polluted too soon and heavy fragmentation is happening.

Our observations:

  1. Object is getting allocated in eden only. No size related problem.
  2. Scope of the object is very less and it is eligible for GCing in the next minor gc.
  3. To ensure this, we printed few loggers in finalize() and observed that scope ends immediately after the request. Just after the first minor GC.

Note: finalize() is just added for tracking purpose. Even without finalize() old gen promotion happens.

  1. After a single minor gc:
  • expectation is that object is going to get cleared.
  • But the object is getting promoted to old generation. With the help of multiple heap dumps, we are able to track the object promotion to old generation.
  1. All such objects are getting accumulated in old gen and are GCed by old gen GC.
  2. This behaviour is seen only in production servers and not reproducible in test environments.

Kindly suggest on how to proceed further and fix this.

Another interesting update: With G1GC, the objects are getting cleared properly when G1GC is used. Checked with using finalize() method, after the first cycle, the object became unreachable. After the next minor cycle, the object was not there. With G1GC the issue is not there.

vkm
  • 111
  • 1
  • 8
  • 1
    Could use (over-use) of finalize be causing this; e.g. objects in the finalization queue being promoted? – Stephen C Sep 01 '21 at 06:49
  • Agree with @StephenC. `finalize()` should really _never_ be used - it causes objects to require at least two GC cycles to clear in the best case. And using it to observe GC behaviour is unhelpful as it changes GC behaviour. – Boris the Spider Sep 01 '21 at 06:54
  • Also note that [CMS is deprecated](https://openjdk.java.net/jeps/291). Consider switching to a modern alternative. – Boris the Spider Sep 01 '21 at 06:55
  • @StephenC To be more clear, I just added a finalize() method to track the liveness of the object. Yes, even without finalize() method, it got promoted to old gen. – vkm Sep 01 '21 at 06:56
  • @BoristheSpider G1GC looks promising for our application, we have part of our system running in it. Thanks for the suggestion. – vkm Sep 01 '21 at 06:57
  • 1
    How does a heap dump prove that an object has been promoted to the old generation? As far as I know, it doesn’t contain information about generations at all. – Holger Sep 01 '21 at 07:42
  • 1
    @Holger Version: Java 8. With this flag, –XX:+PrintHeapAtGC, it is possible to print the region specific addresses. Later that we will take a heap dump and parse it using eclipse memory analyzer(MAT). By writing an OQL based on address, we can filter out the objects by region. By taking multiple dumps, we can track the change of address for the required objects too. – vkm Sep 01 '21 at 08:38
  • 1
    The heap dump doesn’t contains object addresses. A heap dump doesn’t even contain the information whether the JVM used 32 bit or 64 bit addresses, no memory locations, no information about alignment nor padding. Instead of trying to map the IDs used in the heap dump to real memory addresses, you could read tea leaves. Recommended read https://shipilev.net/blog/2014/heapdump-is-a-lie/ – Holger Sep 01 '21 at 08:59
  • 1
    But anyway, does the MAT really say that there are no references to the objects in question? – Holger Sep 01 '21 at 10:16
  • @Holger Yes, MAT shows that as unreachable object. Even the finalize() method ran after a minor GC. So definitely no other reference would be there. – vkm Sep 01 '21 at 11:29
  • "Instead of trying to map the IDs used in the heap dump to real memory addresses" - Am not relating it to the real address. This works perfectly good to identify the heap regions for my case. After certain level of experimentation only we started using it. Its somewhat accurate in our case. I understand your point, but the virtual address is somewhat useful to debug. @Holger – vkm Sep 01 '21 at 11:33
  • 1
    The IDs used in the heap dump could be anything, addresses, values somehow derived from addresses, or entirely unrelated like ascending numbers. However, interpreting them as addresses and matching them to the generations might provide plausible results in your case. In the end, it doesn’t matter, as the main issue is that an apparently unreachable object doesn’t get collected. The question is why does removing the `finalize` method still exhibit this behavior, a) is there still a `finalize` method, e.g. an inherited one, or b) are there phantom references (the only ones not getting cleared)? – Holger Sep 01 '21 at 12:21
  • @Holger "a) is there still a finalize method, e.g. an inherited one" - Checked this with the help of heap dump. No Finalizer reference found. So no chances for this. "b) are there phantom references" When checked code it doesnt seem so.. Is there a way to check that through heap dump? – vkm Sep 01 '21 at 15:40
  • 1
    Assuming that some time elapsed between the first GC and the heap dump, checking the heap dump won’t be sufficient. After the completion of the `finalize()` method, the object will be unreachable (usually) and when whoever maintains a phantom reference (e.g. a cleaner) has been notified, they will clear the reference. In either case, this might be a very short time and have been done already when the heap dump is made. The best approach to preclude a finalizer, is to deliberately add an empty `finalize() {}` method to the class. – Holger Sep 01 '21 at 16:30
  • Ok @Holger. During our lifetime tracking test, the finalize() included ran as expected, So by the next minor gc, the object should have been collected. But in our case, it is getting cleared only in the next old gen GC cycle. – vkm Sep 02 '21 at 05:34
  • Another interesting update: With G1GC, the objects are getting cleared properly when G1GC is used. Checked with using finalize() method, after the first cycle, the object became unreachable. After the next minor cycle, the object was not there. **With G1GC the issue is not there** – vkm Sep 02 '21 at 05:51
  • 2
    G1GC is an entirely different beast. It does even permit the possibility that not all objects get cleared, by design. Further, it has no fixed addresses for generations, so your approach to recognize whether an object has been promoted wouldn’t work at all. On the other hand, it’s not subject to the fragmentation issue that ParNewGC has. So if it solves your issue and exhibits reasonable performance with your application, go for it and stop worrying about outdated GC algorithms. When you are at it, it’s worth giving string deduplication a try. – Holger Sep 02 '21 at 06:47

0 Answers0