0

Our Java Application is working fine on RHEL 8.5 OS platform. In our application, we have provided enough heap space which is "2048m". Even though, we encountered a heap.dump file on Jan 2023. We analyzed the heap.dump file and found that it was a NACACK error.

Post that, without deleting the heap.dump file, we just rebooted the server and our application starts to work fine. Within a few weeks, we ended up with another problem,

java.lang.OutOfMemoryError: GC overhead limit exceeded
Dumping heap to /XYZ/jboss/server/log/heap.dump ...
Unable to create /XYZ/jboss/server/log/heap.dump: File exists.

Please find the below queries,

  1. Whether the reboot of the server with the presence of heap.dump file will clear the heap memory area entirely?
  2. Did the new error is because of the non-clearance of the previous heap.dump file?
  3. What are the possibilities to get the above error so quickly?

Thanks.

Learner
  • 91
  • 2
  • 9
  • 2
    What's a "NACACK error"? 1. Yes, 2. No 3. See https://stackoverflow.com/questions/1393486/error-java-lang-outofmemoryerror-gc-overhead-limit-exceeded – tgdavies Apr 15 '23 at 07:26

1 Answers1

1
  1. Whether the reboot of the server with the presence of heap.dump file will clear the heap memory area entirely?

The heap will always be empty when the JVM starts / restarts.

The heap dump file is not read when a JVM starts. Its presence or absence is not relevant.

  1. Did the new error is because of the non-clearance of the previous heap.dump file?

No. See above.

The message says that the JVM OOME'd again but that the JVM wasn't willing (or able) to overwrite the existing dump file. So you didn't get a new heap dump.

  1. What are the possibilities to get the above error so quickly?

In general, bugs (probably in your application) or the heap is too small. Or both. Causes include the following:

  • A buggy (or poorly designed) application can create in-memory data structures that are excessively large / use to much memory.

  • A buggy application can have memory leaks, meaning that the GC is unable to reclaim objects that are no longer needed. If your application runs for a few weeks before crashing, a memory leak should be the prime suspect.

  • If you set the heap size too small for the problem that the application is solving, you will get OOMEs.

  • If too much time is being spent garbage collecting you will get the "GC overhead limit exceeded" flavor of OOME. This is typically a sign that the heap is nearly full ... and the GC is running repeatedly in a last-ditch attempt to keep the application going. (A "garbage collection death spiral".)

    The root cause of this will most likely be one of the previous.

  • Edge cases:

    • If the application makes excessive or inappropriate use of finalizers, cleaners, Reference objects and the like, the GC's reference processing threads may not be able to keep up leading to an OOME.

    • With some GC's, allocating an extremely large array can OOME because there isn't sufficient contiguous free space after the GC has run.

    • It is possible that the heap size is too large for the host computer to handle; i.e. there is insufficient RAM and / or swap space.

It is not possible to be more specific without examining your application in detail, and looking at the heap dump and the stacktrace from the OOME.

This Q&A may help, but I don't think it answers your question:

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216