1

My problem in short:

  • I have a machine with 500 GB RAM without swap (more than enough) : top command shows 500GB of free ram
  • I have a 20GB file containing triplets (stringOfTypeX, stringOfTypeY, double val). The meaning is that for one string of type X, the file has on average 20-30 lines, each containing this string of type X plus one (different) string of type Y and the double value associated
  • I want to load the file in an in-memory index HashMap < StringOfTypeX, TreeMap < StringOfTypeY, val > >
  • I wrote a Java program using BufferedReader.readLine()
  • in this program, the hashmap is initialized in the constructor using an initCapacity of 2 times the expected number of distinct strings of type X (the expected number of keys)
  • I ran the program using: java -jar XXX.jar -Xms500G -Xmx500G -XX:-UseGCOverheadLimit
  • the program seems to process file lines slower and slower: at first, it processes 2M lines per minute, but with each chunk of 2M lines, it gets slower and slower. After 16M of lines, it is almost stopped and, eventually, it will throw a java.lang.OutOfMemoryError(GC overhead limit exceeded)
  • before it throws that error, top command shows me that it consumes 6% of the 500GB ram (and this value is constant, the program doesn't consume more RAM than this for the rest of its lifetime)
  • I've read all possible internet threads regarding this. Nothing seems to work. I guess the GC starts doing a lot of stuff, but I don't understand why it does this given that I tried to allocate the hashmap enough RAM before the starting. Anyways, it seems that JVM cannot be forced to pre-allocate a big amount of RAM, no matter what command line args I give. If this is true, what is the real usage of Xmx and Xms params ?

Anyone has any ideas? Many thanks !!

Update:

  • my jvm is 64-bit
  • 6.1% of the 515 GB of RAM is ~ 32GB. Seems that JVM is not allowing the usage of more than 32 GB. Following this post I tried to disable the use of compressed pointers using the flag -XX:-UseCompressedOops . However, nothing changed. The limit is still 32GB.
  • no swap is done at any point in time (checked using top)
  • running with -Xms400G -Xmx400G doesn't solve the issue
Community
  • 1
  • 1

3 Answers3

2

It is fairly common to mis-diagnose these sorts of problem.

500 GB should be more than enough, assuming you have more than 500 GB of main memory, swap will not do.

20 GB file is likely to have a significant expansion ration if you have Strings. e.g. a 16 character String will use about 80 bytes of memory., A Double uses around 24 bytes in a 64-bit JVM, not the 8 bytes you might expect.

HashMap and TreeMap uses about 24 extra bytes per entry.

Using readLine() and doubling the capacity is fine. Actually expected-size*4/3 is enough though it always uses the next power of 2.

Setting the -Xms does preallocate the memory specific (or almost that number, it is often out by 1% for no apparent reason)

2 M lines per minute is pretty slow. It suggests your overhead is already very high. I would be looking for something closer to 1 million lines per second.

16 million entries is nothing compared with the size of your JVM. My guess is you have started to swap and the error you see is because the GC is taking too long, not because the heap is too full.

How much free main memory do you have? e.g. in top what do you see after the application dies.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • Hi Peter. Thanks for your reply! Please see my updated post above. I checked and no swap started before reaching the 32GB RAM limit. I am just wondering how can I remove this limit and let the JVM use the entire RAM. The flag -XX:-UseCompressedOops didn't help – Octavian Ganea May 05 '14 at 11:55
  • If you have 32 GB of ram you want to limit the jvm to about 30 GB and use compressed oops. This means you cannot use the standard java collections as they have too much overhead. you will need custom ones. and more memory. – Peter Lawrey May 05 '14 at 12:27
  • 1
    @OctavianGanea: You wrote that you're having 500 GB real memory and that only about 30 GB are used, right? And switching them off seems not to help. Maybe you could find the size of a reference (`Unsafe` should do, but there are some tools) to verify if it's really off. You could also set `-XX:ObjectAlignmentInBytes=16` and check if the available memory doubles. – maaartinus May 05 '14 at 18:21
1

Problem solved:

  • java -jar XXX.jar -Xms500G -Xmx500G -XX:-UseGCOverheadLimit is not correct. The running parameters should be specified before -jar, otherwise they will be considered as Main params. The correct cmd line is java -Xms500G -Xmx500G -XX:-UseGCOverheadLimit -jar XXX.jar args[0] args[1] ... .

Sorry for this and thanks for you answers!

0

You say you have 500GB of RAM. You shouldn't set the Xmx to 500 GB because this is only the Heap size. The VM itself has some memory overhead too. So it is not advised to fully set all memory to it.

I would recommend to profile your application using for example JVisualVM. Or make an heapdump to know what really is in the memory. Maybe something is not cleaned up.

keiki
  • 3,260
  • 3
  • 30
  • 38