10

I'm running java with java -Xmx240g mypackage.myClass

OS is Ubuntu 12.10.

top says MiB Mem 245743 total, and shows that java process has virt 254g since the very beginning, and res is steadily increasing up to 169g. At that point it looks like it starts garbage collect a lot, I think so because the program is single-threaded at that point, and CPU% is mostly 100% up to this point, and it jumps around 1300-2000 at this point (I conclude it is multithreaded garbage collector), and then res slowly moves to 172g. At that point java crashes with

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

at the line with new double[2000][5]

java -version says

java version "1.7.0_15" OpenJDK Runtime Environment (IcedTea7 2.3.7) (7u15-2.3.7-0ubuntu1~12.10) OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)

Hardware is Amazon cr1.8xlarge instance

It seems to me that java crashes even when there's a lot of memory available. It is clearly not possible, I have to interpret some numbers wrong. Where should I look to understand what's going on?

Edit:

I don't specify any GC options. The only command-line option is -Xmx240g

My program is successfully working on many inputs, and top said sometimes that it uses up to 98.3% of memory. However I reproduced the situation described above with certain program input.

Edit2:

This is scientific application. It has gigantic tree (1-10 millions of nodes), in each node there are couple double arrays with size approx. 300x3 - 900x5. After initial tree creation program does not allocate much memory. Most of the time there are some arithmetic operations going on with these arrays.

Edit3:

HotSpot JVM died the same way, used CPU a lot at 170-172g mark and crashed with the same error. Looks like 70-75% of memory is some magical line that JVM does not want to cross.

Final solution: With -XX:+UseConcMarkSweepGC -XX:NewRatio=12 program made it through 170g mark and is happily working further.

  • How much memory is your program allocating? A new double[2000][5] can cost up to 128024 bytes of memory - since you don't even have 2x that available, an OOM error should certainly not be unexpected. – torquestomp Apr 19 '13 at 00:48
  • torquestomp, new double[2000][5] can cost up to 128k, and I have 70 _gigabytes_ available –  Apr 19 '13 at 00:49
  • 1
    +1 for a 240g heap on IcedTea. That takes balls. – jonathan.cone Apr 19 '13 at 00:50
  • jonathan.cone, please clarify, should I use another JVM with that amount of memory? –  Apr 19 '13 at 00:52
  • What GC options are you using? – jonathan.cone Apr 19 '13 at 00:54
  • jonathan.cone, updated the question –  Apr 19 '13 at 01:00
  • use jvisualvm (or other available tools) to find out where the memory leak is. It could be possible that new double[2000][5] is not causing trouble. – sarahTheButterFly Apr 19 '13 at 01:02
  • Java, particularly with its GCing, is not really designed to run with such extreme amounts of memory. You should look into "off heap" storage frameworks like Ehcache and BigMemory Go. Otherwise, GCing is going to take forever at that scale. – pickypg Apr 19 '13 at 01:15
  • possibly related:http://stackoverflow.com/questions/1949904/why-am-i-able-to-set-xmx-to-a-value-greater-than-physical-and-virtual-memory-on – MarianP Apr 19 '13 at 01:53
  • I'd try a non-Oracle JVM. – MarianP Apr 19 '13 at 01:55

3 Answers3

9

Analysis

The first thing you need to do is get a heap dump so you can figure out exactly what the heap looks like when the JVM crashes. Add this set of flags to the command line:

-XX:+HeapDumpOnOutOfMemoryError -verbose:gc -XX:+PrintGCDetails

When a crash happens, the JVM is going to write out the heap to disk. And frankly, its going to take a long time on a heap that size. Download Eclipse MAT or install the plugin if you're already running Eclipse. From there, you can load up the heap dump and run a couple of canned reports. You'll want to check the Leak Suspects and Dominator Tree to see where your memory is going and determine that you don't have an actual leak.

After that, I would recommend you read this document by Oracle about Garbage Collection, however here are some things you can consider:

Concurrent GC

-XX:+UseConcMarkSweepGC 

I've never heard of anyone getting away with using the parallel only collector on a heap that size. You can activate the concurrent collector, and you'll want to read up on incremental mode and determine if its right for your workload / hardware combo.

Heap Free Ratio

-XX:MinHeapFreeRatio=25

Dial this down to lower the bar for the garbage collector when you do a full collection. This may prevent you from running out of memory doing a full collection. 40% is the default, experiment with smaller values.

New Ratio

-XX:NewRatio

We'll need to hear more about your actual workload: is this a webapp? A swing app? Depending on how long objects are expected to remain alive on the heap will have an impact on the new ratio value. Server-mode VMs like the one you're running have a fairly high new ratio by default (8:1), this may not be ideal for you if you have a lot of long-lived objects.

jonathan.cone
  • 6,592
  • 2
  • 30
  • 30
  • Thank you very much for verbose answer. I updated the question, I do have a lot of long-lived objects. –  Apr 19 '13 at 01:52
  • Please clarify, how exactly will MinHeapFreeRatio help me? If JVM doesn't have enough memory for full garbage colleciton now, why would less memory be enough? –  Apr 19 '13 at 02:29
  • The ratio dictates what percentage of the heap must be free after garbage collection. If the number is too high, you will run out of heap space sooner because the collector cannot scavage enough memory to meet the required ratio. This would come into play if you're running out of heap space during a full GC. – jonathan.cone Apr 19 '13 at 03:20
1

As a general advice, NEVER use OpenJDK, even less for production environments, it is much slower than the one from Sun/Oracle.

Apart from that I have never seen VM using sooo much memory, but I guess that is what you need (or maybe you have a code using more memory than needed?)

EDIT : OpenJDK for server is fine, only differences with Sun/Oracle JDK is regarding desktop stuff (sound, gui...) so ignore that part.

Juan Antonio Gomez Moriano
  • 13,103
  • 10
  • 47
  • 65
  • Thank you, I'll try Sun JVM. And yes, I need sick amount of memory. –  Apr 19 '13 at 01:07
  • 1
    Regarding the memory, think if you need so much, do not get me wrong, I am sure you have a reason to use 70GB but in my own experience, sometimes just checking again the problem from another perspective saves you a lot of time/memory/pain :) Maybe you can use different algorithms or data structures? – Juan Antonio Gomez Moriano Apr 19 '13 at 01:09
  • please stop spreading FUD about openjdk being slower or whatever. It maybe was true like 3 or 4 years ago. for server java, openjdk 7 is the same as oracle jdk 7. – Denis Tulskiy Apr 19 '13 at 03:20
  • @DenisTulskiy That is not my experience, however I would be happy to find I am wrong, can you please provide me a good source to support your affirmation? – Juan Antonio Gomez Moriano Apr 19 '13 at 03:32
  • 1
    According to this article: http://weblogs.java.net/blog/robogeek/archive/2007/10/openjdk_encumbr.html the parts that were reimplemented in openjdk were font rendering, java2d and javasound. so mostly desktop stuff. For the crypto stuff, I think they open sourced existing code. Moreover, OpenJDK 7 is now the reference implementation of JDK7. So for server code, HotSpot implementation and class library is identical. – Denis Tulskiy Apr 19 '13 at 04:00
1

If I understood your question correcly, it looks like memory leak actually happening before the program hits the line new double[2000][5]. It seems the memory is already low whe nthe line is hit, thus it throws up when this line asks for more memory.

I would use jvisualvm or similar tools to find out where the memory leak is. Memory leak I've encountered mostly to do with Strings being created in a loop, Cache not being cleared etc.

sarahTheButterFly
  • 1,894
  • 3
  • 22
  • 36