Why JVM can recovery from OOM Java heap space by itself

Question

Integer[][] data = new Integer[1000000][100000];

As above simple demo code, i try to apply remarkable big memory and trigger OOM in pandora container(it is an alibaba developed web container, like tomcat)。but seems this error ONLY affect the current request, web service do NOT be collapsed; As i known, different from exception, error in java should NOT recovery, and affect the whole process。i was puzzled, please advice..thanks.

You should typically not attempt to _handle_ instances of `Error`, but what makes you think the whole JVM should crash upon any `Error` being thrown? In particular, why should the JVM crash in your example? The OOME is thrown because there's not enough memory to allocate the huge two-dimensional array. That means the array is not actually allocated. In other words, there is still memory available after the OOME is thrown. — Slaw, Jul 05 '22 at 06:57

Stephen C · Answer 1 · 2022-07-05T11:57:29.593

First this:

As I known, different from exception, error in java should NOT recovery, and affect the whole process.

In general that is true. In the case of a web container, recovery from OOME on a request thread has a better chance of succeeding than for a typical multi-threaded application.

Why?

Because the work done in to handle one web request on one worker thread is typically independent of other threads. That means that an OOME in a request is less likely to leave shared data structures in an inconsistent state.

But you still have the problem that the root cause of the OOME could be a memory leak ... and that most likely won't go away when the web container cleans up the request thread and creates a new one. Hence it is still dubious for a web container to recover from OOMEs.

But this is fairly common behavior anyway. I think that the reasoning is that attempting to recover with a reasonable chance of succeeding is better than failing fast.

So why is it possible to recover at all?

Consider this snippet:

   public void test() {
       try {
           Integer[][] data = new Integer[1000000][100000];
       } catch (OutOfMemoryError ex) {
           // log it
       }
       // do something else
   }

Observations:

An OOME happens after the GC has run. The typical sequence of JVM actions leading up to an OOME is something like this:
- Attempt to allocate large object
- Find there is not enough free space
- Run a new space GC
- Try the allocation again.
- Still not enough free space
- Run a full GC
- Try the allocation again.
- Still not enough free space
- Throw OOME
The new Integer[10000][10000] is an all-or-nothing thing. If it triggers an OOME, then the objects that it has allocated so far will all be unreachable. So if the // do something else code tries to allocate another object, and the heap is still full, then the JVM will run the GC again ... which will reclaim those unreachable objects ... and we are back in business.
Even if it wasn't ... when data goes out of scope, the tree of Integer[][] and Integer[] objects that it refers to may now be detectable as unreachable by the GC.
If the OOME is thrown on a child thread, and the thread is allowed to die, then all of the thread's local variables will no longer be reachable. That results in more (potentially) unreachable objects.

The point is that at the point you catch and recover from the OOME, there is likely to be some collectable garbage.

So if it is possible to recovery, why do people advise against recovering from OOMEs at all?

Because the OOME's cause may be a memory leak. Recovering from an OOME that is caused by a memory leak can result in poor performance. The heap will eventually fill up to a point where GC takes far too much time.
Because an OOME can lead to a data structure being left in an inconsistent state; e.g. your code was updating it when it got the OOME.
Because an OOME can break concurrent behaviors. For example, suppose thread A is waiting for a notify from thread B. If B gets an OOME, it may die completely, or it may attempt to recover ... at a point where its lock has been released. Either way, there is a risk that thread A will be stuck for ever waiting for a notify that never will happen. (Thread B should probably trigger an application shutdown to avoid this.)

Forketyfork · Answer 2 · 2022-07-05T08:34:42.383

0

The error OutOfMemoryError: Java heap space is similar to any other exception, it only causes the current thread to be terminated. It's only about running out of Java heap space. So if the thread is killed and all objects created in this thread become unreachable and may be collected, there's enough heap space to continue, and there's no reason for the whole application to crash.

The application itself will only be killed by the operating system if it exceeds the memory that the operating system can provide. There are however some keys you can use in some JVMs to explicitly crash on any out-of-memory error, e.g. -XX:+CrashOnOutOfMemoryError.

edited Jul 05 '22 at 08:34

answered Jul 05 '22 at 07:05

Forketyfork

7,416
1
26
33

3

An exception does not “crash the current thread” at all. It transfers control to the closest matching exception handler. It’s a well defined procedure. – Holger Jul 05 '22 at 07:20
@Holger sure, I think it's obvious. I didn't mention such technicalities as "if the exception is not caught and escapes the thread", as they are beyond the point here, and such types of exceptions as `OutOfMemoryError` are not usually caught anyway. – Forketyfork Jul 05 '22 at 07:23
2

Maybe you think it is obvious. But what about the OP? – Stephen C Jul 05 '22 at 07:25
@StephenC I don't know whether the fact that you can catch exceptions in Java is obvious for OP, but as I said, I don't see the relevance to the question. The question was essentially "why only the thread is killed in this case, and not the application itself". I believe I've answered it, but OP is free to ask for clarification. – Forketyfork Jul 05 '22 at 07:33
To be fair, there's no evidence in the OP's question that a thread is being _killed_. The OP only states that the current _request_ fails (the OP did their test in a web server/container). – Slaw Jul 05 '22 at 07:38
@Slaw sure, the OP used the word "collapsed", which I assumed might mean "killed", especially since this is exactly what happens in this case on Tomcat. I know nothing about the "alibaba pandora container", but I assumed there's no special magic there, and the behavior is indeed similar to Tomcat. – Forketyfork Jul 05 '22 at 07:40
2

The word 'crash' is out of place here. The exception is catchable. A crash is a catastrophe in computing. This isn't. – user207421 Jul 05 '22 at 08:28
@user207421 thanks for pointing this out, "crash" indeed doesn't make sense in relation to the Java thread termination, I'll replace it with "causes the current thread to be terminated". – Forketyfork Jul 05 '22 at 08:34

Why JVM can recovery from OOM Java heap space by itself

2 Answers2

Linked

Related