64-bit JVM limited to 300GB of memory?

Question

I am attempting to run a Java application on a cluster computing environment (IBM LSF running CentOS release 6.2 Final) that can provide me with up to 1TB of RAM space.

I could create a JVM with up to 300GB of maximum memory (Xmx), although I need more than that (I can provide details, if requested).

However, it seems to be impossible to create a JVM with more than 300GB of maximum memory using the Xmx option. To be more specific, I get the classic error message:

Error occurred during initialization of VM.

Could not reserve enough space for object heap.

The details of my (64-bit) JVM are below:

OpenJDK Runtime Environment (IcedTea6 1.10.6) (rhel-1.43.1.10.6.el6_2-x86_64)

OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)

I've also tried with a Java 7 64-bit JVM but I've had exactly the same problem.

Moreover, I tried to create a JVM to run a HelloWorld.jar, but still JVM creation fails if you ask for more than -Xmx300G, so I don't think it has anything to do with the specific application.

Does anyone have any idea why I cannot create a JVM with more than 300G of max memory?

Can anyone please suggest a solution/workaround?

Three close votes? and Many upvotes! Question may not be code related, but answer will come from developer. — Jayan, Mar 28 '14 at 11:28
See question http://stackoverflow.com/questions/2093679/max-memory-for-64bit-java, IMHO you are missing something.. Please try with java -Xmx300g -version — Jayan, Mar 28 '14 at 11:30
Looks like the parameter has nothing to do with it; Java is not complaining about you specifying too large a value, it is reporting it -cannot- reserve as much as you specify. As in it is physically incapable of doing it. You have to investigate why not; I'd start from the perspective of the OS. — Gimby, Mar 28 '14 at 11:37
You may have 1To of RAM, but not in a contiguous segment. The OS is therefore not able to provide this to the JVM ? — johan d, Mar 28 '14 at 12:37
Actually things go sour because of the MAXIMUM heap space here, but that says nothing about how much memory Java reserves INITIALLY. It's not like Java is being instructed to reserve 300GB of memory in one go, unless the -Xms option is also specified to be the same as the -Xmx value. Is it? If so: don't do that. — Gimby, Mar 28 '14 at 13:02
@joh What do you mean? Physical memory does not have to be a contigous segment. http://en.wikipedia.org/wiki/Virtual_memory — ZhekaKozlov, Mar 28 '14 at 13:04
@Gimby: I tried both with setting Xms400g and without, the result was exactly the same — critichu, Mar 31 '14 at 18:28
@critichu actually you should try with a small value for Xms, not a huge one. Try with 256m or something. — Gimby, Apr 01 '14 at 07:27
Try different garbage collectors. I prefer G1, but an older one may be more reliable. — Aleksandr Dubinsky, Apr 02 '14 at 03:24
@fge: I did run strace, it failed at some futex, I don't think it adds any information but here's the last line in that stacktrace: futex(0x7f5cedc389d0, FUTEX_WAIT, 53887, NULL — critichu, Apr 02 '14 at 13:54
on the node itself free -m gives me: total used free shared buffers cached Mem: 1033945 323272 710673 0 17421 220497, so about 710GB of "free" memory — critichu, Apr 02 '14 at 15:31
Are you ready when the JVM is going to garbage collect 300 GB ? — Chiron, Apr 04 '14 at 15:14

Stephen C · Answer 1 · 2014-03-30T11:09:29.050

18

I can think of a couple of possible explanations:

Other applications on your system are using so much memory that there isn't 300Gb available right now.
There could be a resource limit on the per-process memory size. You can check this using ulimit. (Note that according to this bug, you will get the error message if the per-process resource limit stops the JVM allocating the heap regions.)
It is also possible that this is an "over commit" issue; e.g. if your application is running in a virtual and the system as a whole cannot meet the demand because there is too much competition from other virtuals.

A couple of the other ideas suggested are (IMO) unlikely:

Switching the JRE is unlikely to make any difference. I've never heard or seen of arbitrary memory limits in specific 64 bit JVMs.
It is unlikely to be due to not having enough contiguous memory. Certainly contiguous physical memory is not required. The only possibility might be contiguous space on the swap device, but I don't recall that being an issue for typical Linux OSes.

Can anyone please suggest a solution/workaround?

Check the ulimit.
Write a tiny C program that attempts to malloc lots of memory and see how much that can allocate before it fails.
Ask the system (or hypervisor) administrator for help.

edited Mar 30 '14 at 11:09

answered Mar 28 '14 at 14:14

Stephen C

698,415
94
811
1,216

Another thought about non-continous memory: the JVM might map some libraries or other internal stuff into memory regions at 300GB. After that it tries to allocate the heap memory and does not find a continous address region. That would be a JVM bug. And trying another JVM Luke Jrockit nicht help. – A.H. Mar 30 '14 at 09:34
I suppose that is possible, but do have any evidence that such a bug actually exists? Like a link to an entry in the Java bug database? – Stephen C Mar 30 '14 at 11:01
I die not claim that there is an actual bug. But the QO could check with pmap and different memory setting if there is some memory region whose address does not move when increasing the settings. – A.H. Mar 30 '14 at 12:06
@StephenC: ulimit (and ulimit -a) show memory as "unlimited" – critichu Mar 31 '14 at 09:53
@critichu - try the other things. I'm only suggesting these as *possible* causes ... in the absence of any real evidence as to what is happening. – Stephen C Mar 31 '14 at 10:11

score 15 · Answer 2 · edited May 23 '17 at 12:09

(edited, see added section on swap space)

SHMMAX and SHMALL

Since you are using CentOS, you may have run into a similar issue about the SHMMAX and SHMALL kernel setting as described here for configuring the Oracle DB. Under that same link is an example calculation for getting and setting the correct SHMALL setting.

Contiguous memory

Certain users have already reported that not enough contiguous memory is available, others have said it is irrelevant.

I am not certain whether the JVM on CentOS requires a contiguous block of memory. According to SAS, fragmented memory can prevent your JVM to startup with a large max Xmx or start Xms memory setting, but other claims on the internet say it doesn't matter. I tried to proof or unproof that claim on my 48GB Windows workstation, but managed to start the JVM with an initial and max setting of 40GB. I am pretty sure that no contiguous block of that size was available, but JVMs on different OS's may behave differently, because the memory management can be different per OS (i.e., Windows typically hides the physical addresses for individual processes).

Finding the largest contiguous memory block

Use /proc/meminfo to find the largest contiguous memory block available, see the value under VmAllocChunk. Here's a guide and explanation of all values. If the value you see there is smaller than 300GB, try a value that falls right under the value of VmAllocChunk.

However, usually this number is higher than the physically available memory (because it is the virtual memory value available), it may give you a false positive. It is the value you can reserve, but once you start using it, it may require swapping. You should therefore also check the MemFree and the Inactive values. Conversely, you can also look at the whole list and see what values do not surpass 300GB.

Other tuning options you can check for 64 bit JVM

I am not sure why you seem to hit a memory limit issue at 300GB. For a moment I thought you might have hit a maximum of pages. With the default of 4kB, 300GB gives 78,643,200 pages. Doesn't look like some well-known magical number. If, for instance, 2^24 is the maximum, then 16,777,216 pages, or 64GB should be your theoretical allocatable maximum.

However, suppose for the sake of argument that you need larger pages (which is, as it turns out, better for performance of large memory Java applications), you should consult this manpage on JBoss, which explains how to use -XX:+UseLargePages and set kernel.shmmax (there it is again), vm.nr_hugepages and vm.huge_tlb_shm_group (not sure the latter is required).

Stress your system

Others have suggested this already as well. To find out that the problem lies with the JVM and not with the OS, you should stresstest it. One tool you could use is Stresslinux. In this tutorial, you find some options you can use. Of particular interest to you is the following command:

stress --vm 2 --vm-bytes 300G --timeout 30s --verbose

If that command fails, or locks your system, you know that the OS is limiting the use of that amount of memory. If it succeeds, we should try to tweak the JVM such that it can use the available memory.

EDIT Apr6: check swap space

It is not uncommon that systems with very large internal memory sizes, use little or no swap space. For many applications this may not be a problem, but the JVM requires the swap available swap space to be larger than the requested memory size. According to this bug report, the JVM will try to increase the swap space itself, however, as some answers in this SO thread suggested, the JVM may not always be capable of doing so.

Hence: check the currently available swap space with cat /proc/swaps # free and, if it is smaller than 300GB, follow the instructions on this CentOS manpage to increase the swap space for your system.

Note 1: we can deduct from bugreport #4719001 that a contiguous block of available swap space is not a necessity. But if you are unsure, remove all swap space and recreate it, which should remove any fragmentation.

Note 2: I have seen several posts like this one reporting 0MB swap space and being able to run the JVM. That is probably due to the fact that the JVM increases the swap space itself. Still doesn't hurt to try to increase the swap space by hand to find out whether it fixes your issue.

Premature conclusion

I realize that non of the above is an out-of-the-box answer to your question. I hope it gives you some pointers though to what you can try to get your JVM working. You might also try other JVM's, if the problem turns out to be a limit of the JVM you are currently using, but from what I have read so far, no limit should be imposed for 64 bit JVM's.

That you get the error right on initialization of the JVM leads me to believe that the problem is not with the JVM, but with the OS not being able to comply to the reservation of the 300GB of memory.

My own tests showed that the JVM can access all virtual memory, and doesn't care about the amount of physical memory available. It would be odd if the virtual memory is lower than the physical memory, but the VmAllocChunk setting should give you a hint in that direction (it is usually much larger).

score 4 · Answer 3 · answered Mar 28 '14 at 13:09

If you have a look at the FAQ section of Java HotSpot VM, its mentioned that on 64-bit VMs, there are only 64 address bits to work with and hence the maximum Java heap size is dependent on the amount of physical memory & swap space present on the system.

If you calculate theoretically then you can have a memory of 18446744073709551616 MB, but there are above limitation to it.

You have to use -Xmx command to define maximum heap size for JVM, By default, Java uses 64 + 30% = 83.2MB on 64-bit JVMs.

I tried below command on my machine and it looked to work fine.

java -Xmx500g com.test.TestClass

I also tried to define maximum heap in terabytes but it doesn't work.

score 0 · Answer 4 · answered Mar 28 '14 at 14:16

0

Run ulimit -a as the JVM Process's user and verify that your kernel isn't limiting your max memory size. You may need to edit /etc/security/limit.conf

answered Mar 28 '14 at 14:16

user3100381

464
2
5

score 0 · Answer 5 · edited Apr 13 '17 at 12:53

0

According to this discussion, LSF does not pool node memory into a single shared space. You are using something else for that. Read that something's documentation, because it is possible it cannot do what you are asking it to do. In particular, it may not be able to allocate a single contiguous region of memory that spans all the nodes. Usually that's not necessary, as an application will make many calls to malloc. But the JVM, to simplify things for itself, wants to allocate (or reserve) a single contiguous region for the entire heap by effectively calling malloc just once. Or it could be something else related to whatever you are using to emulate a giant shared memory machine.

edited Apr 13 '17 at 12:53

Community

1
1

answered Apr 02 '14 at 03:45

Aleksandr Dubinsky

22,436
15
82
99

I would generally agree, but our environment has 1TB memory per node (not 1TB total) and I'm asking LSF to execute everything on the same node – critichu Apr 02 '14 at 12:53
@critichu In that case, you have one impressive environment. – Aleksandr Dubinsky Apr 05 '14 at 20:54