11

We have an application that spawns new JVMs and executes code on behalf of our users. Sometimes those run out of memory, and in that case behave in very different ways. Sometimes they throw an OutOfMemoryError, sometimes they freeze. I can detect the latter by a very lightweight background thread that stops to send heartbeat signals when running low on memory. In that case, we kill the JVM, but we can never be absolutely sure what the real reason for failing to receive the heartbeat was. (It could as well have been a network issue or a segmentation fault.)

What is the best way to reliably detect out of memory conditions in a JVM?

  • In theory, the -XX:OnOutOfMemoryError option looks promising, but it is effectively unusable due to this bug: https://bugs.openjdk.java.net/browse/JDK-8027434

  • Catching an OutOfMemoryError is actually not a good alternative for well-known reasons (e.g. you never know where it happens), though it does work in many cases.

  • The cases that remain are those where the JVM freezes and does not throw an OutOfMemoryError. I'm still sure the memory is the reason for this issue.

Are there any alternatives or workarounds? Garbage collection settings to make the JVM terminate itself rather than freezing?

EDIT: I'm in full control of both the forking and the forked JVM as well as the code being executed within those, both are running on Linux, and it's ok to use OS specific utilities if that helps.

Simon Fischer
  • 1,154
  • 6
  • 22
  • 2
    It sounds like what you're really interested in is detecting when an out-of-memory has occurred in *another process*; a key point which is not even hinted at by the title of your question. – supercat Oct 08 '14 at 17:39
  • Thanks. I tried to make that more clear in the post, but I haven't changed the title so far, since all alternative titles I was able to come up with were misleading. In particular I don't mind whether we get the information from inside the JVM, from the calling JVM, by looking at it as a JVM with its specific behaviour, or by just looking at it as a process. – Simon Fischer Oct 08 '14 at 18:02
  • 2
    If you don't improve your title, many people who might be able to answer are unlikely to even open your post. Perhaps "Triggering an alarm if a Java VM process runs out of memory" would be a better title? – supercat Oct 08 '14 at 18:08
  • "*Catching an OutOfMemoryError*" - I guess, you're already using `setDefaultUncaughtExceptionHandler`, right? In theory, you could allocate a few megabyte and free them when an OOME happens, so the error handler gets better chances to survive. Just guessing... – maaartinus Oct 10 '14 at 08:18
  • For for information about why catching an OutOfMemoryError is problematic, see https://stackoverflow.com/questions/8728866/no-throw-virtualmachineerror-guarantees – Raedwald Nov 15 '17 at 16:32

3 Answers3

2

The only real option is (unfortunately) to terminate the JVM as soon as possible.

Since you probably cant change all your code to catch the error and respond. If you don't trust the OnOutOfMemoryError (I wonder why it should not use vfork which is used by Java 8, and it works on Windows), you can at least trigger an heapdump and monitor externally for those files:

java .... -XX:+HeapDumpOnOutOfMemoryError "-XX:OnOutOfMemoryError=kill %p"
рüффп
  • 5,172
  • 34
  • 67
  • 113
eckes
  • 10,103
  • 1
  • 59
  • 71
  • The -XX:+HeapDumpOnOutOfMemoryError is in fact an option we did not try yet. As of now, it seems this does not get created reliably as well. WRT the -XX:OnOutOfMemoryError: It does work, but only if the OS has ~50% of total memory still available, in which case I'd rather give it to the JVM rather than reserving it for this purpose only :-) – Simon Fischer Oct 09 '14 at 22:08
  • @SimonFischer IMHO JRE switched to vfork for Runtime#exec(), I am not sure if this also includes OnOutOfMemory. But of course it is correct that it might not be able to exexcute a command in space constrained conditions. I am not sure about the CrashReporter server and OS Creash reporting if they are options. After all you would need to check for a vanished process anyway. – eckes Oct 13 '14 at 00:00
1

After experimenting with this for quite some time, this is the solution that worked for us:

  1. In the spawned JVM, catch an OutOfMemoryError and exit immediately, signalling the out of memory condition with an exit code to the controller JVM.
  2. In the spawned JVM, periodically check the amount of consumed memory of the current Runtime. When the amount of memory used is close to critical, create a flag file that signals the out of memory condition to the controller JVM. If we recover from this condition and exit normally, delete that file before we exit.
  3. After the controlling JVM joins the forked JVM, it checks the exit code generated in step (1) and the flag file generated in step (2). In addition to that, it checks whether the file hs_err_pidXXX.log exists and contains the line "Out of Memory Error". (This file is generated by java in case it crashes.)

Only after implementing all of those checks were we able to handle all cases where the forked JVM ran out of memory. We believe that since then, we have not missed a case where this happened.

The java flag -XX:OnOutOfMemoryError was not used because of the fork problem, and -XX:+HeapDumpOnOutOfMemoryError was not used because a heap dump is more than we need.

The solution is certainly not the most elegant piece of code ever written, but did the job for us.

Simon Fischer
  • 1,154
  • 6
  • 22
  • Could you explain the "fork problem" - I am currently adding this to one of our services and can't find more about the fork issue. – Claim Aug 24 '23 at 11:10
0

In case you do have control both over the application and configuration, the best solution would be to find the underlying cause for the OutOfMemoryError being thrown and fix this, instead of trying to hide the symptoms either by catching the error or just restarting JVMs.

From what you describe, it definitely looks that either the application running on the JVM is leaking memory, is just running using under-provisioned resources (memory in your case) or is occasionally processing transactions requiring abnormally large chunks of heap. Solutions for those cases would be different:

  1. In case of a memory leak, find the underlying cause and have engineers fix it. Tools for this include heap dump analyzers, profilers or leak detectors
  2. In case of under-provisioned resources you need to monitor the application memory consumption, for example via garbage collection logs and adjust the sizes of different memory pools based on what you face.
  3. In case of surge allocations during user transactions, you need to trace down the code causing the surge it and having engineers to fix it - via disabling certain user inputs or loading and processing the data in smaller batches. Either thread dumps or heap dumps from the processes can guide you towards the solution.
Flexo
  • 87,323
  • 22
  • 191
  • 272
Ivo
  • 444
  • 3
  • 7
  • From the question I assumed there was no single cause for the OutOfMemory, just arbitrary user code being run. I don't think he's asking how to fix the user's code – matt freake Oct 09 '14 at 09:00
  • That's correct. People submit workflows (think of it as visual programming) to us. We run them in forked JVMs. If such a workflow performs a matrix operation on a 16 GB data file while choosing to do so on an 8 GB machine, this can't work. The caller has to fix it, but we need to tell them that the memory is the problem, not a JVM bug or other bug on our end. – Simon Fischer Oct 09 '14 at 22:00
  • In this case, Plumbr (https://plumbr.eu) would do just that - in case of a memory leak for example you would get exact root cause which you can ship to engineering who based on that can immediately zoom into the underlying problem as the incident reports from Plumbr refer back to the exact line in source code causing the issue. – Ivo Oct 10 '14 at 17:15