We have an application that is widely deployed (several hundred workstations running it). At one site (and only one site - our product is widely deployed to many environments), we randomly get the following error:
java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Unknown Source)
Operating system is Windows 7 64 bit We are running in a 32 bit JVM ( 1.7.0_45)
Using Windows Task Manager, I can see that the process has 39 native threads (not very many), so we don't have a thread leak in our app... There are no other processes consuming lots of threads (Explorer has 35, jvisualvm has 24, iexplore has 20, ... I don't have an exact count, but we are probably looking at maybe 300 threads for the user total).
I have attempted to attach JVisualVM, but it fails to connect to the process (probably b/c of thread exhaustion). But from the metrics I can obtain from JVisualVM, the number of Java threads is about 22 live and 11 daemon.
The heap is well behaved - heap is 500MB with 250MB actually used.
The process is launched with -Xmx512m
Our process is showing Memory usage (in Task Manager) of 597,744K.
The workstation has 8GB RAM, of which only 3.8-4.0GB are used (I know, a 32 bit process won't access all of that, but there's still plenty)
Used VMMap, and the stack is 49,920KB size with 2,284K committed.
The process shows 5358KB free, and the largest allocatable block in the free list is 1,024K in size.
I used Resource Monitor and it's showing the Commit (KB) to be 630428, working set (KB) is 676,996, Shareable (KB) is 79,252 and the Private (KB) is 597,744
I am at a complete loss as to what is going on here. I've read a ton of articles on this, and it sounds like on some Linux systems, there is a per-user thread limit that can cause problems (but this is not Linux, and the problems described in other articles usually talk about needing thousands of threads - definitely not our case here).
If our heap was really big, I could see that eating into space available for threads, but 500MB seems like a very reasonable and small heap (esp for a workstation with 8GB RAM).
So I've pretty much exhausted everything I know to do - does anyone have any additional pointers about what might be going on here?
EDIT 1:
I found this interesting article: Eclipse crashes with "Unable to create new native thread" - any ideas? (my settings and info inside)
They are suggesting that stack size could be the problem.
This article: where to find default XSS value for Sun/Oracle JVM? - gives a link to Oracle documentation saying that default stack size is 512KB. So if my app has about 40 threads, we are looking at 20 MB of stack. 500MB heap. This all seems to be well within normal bounds for a 32 bit Java process.
So that leaves me with two possibilities that I can think of:
- Some transient condition is causing a huge number of threads to be created (but those threads are discarded before we have a chance to do diagnostics)
- Memory segmentation is killing us for some reason. It is interesting that the largest allocatable block (per VMMap is 1MB) - that doesn't seem like very much... On another machine where things are working fine, the largest allocatable block is 470MB...
So, are there any pointers about how to check for memory segmentation?
EDIT 2:
Article linked to by @mikhael ( http://blog.egilh.com/2006/06/2811aspx.html ) gives some rough calculations for allowed # of threads on 32 bit JVM.
I'm going to assume:
OS process space limit: 2GB Modern JVM requires 250MB (this is a big assumption - I just doubled what was in the linked article) Stack size (default Oracle): 512KB Heap: 512MB PermGen: (can't remember exactly, but it was certainly less than 100MB, so let's just use that)
So I have a worst case scenario of: (2GB - .25GB - .5GB - .1GB)/.005GB = 230 threads
EDIT 3:
Info I should have included originally: The application runs fine for a good while (like 24 to 48 hours) before this problem happens. The application does continuous background processing, so has very little idle time. Not sure if that's important or not...
EDIT 4:
More info: Looking at VMMap from another failure, and I'm seeing native heap exhaustion.
The Heap size is 1.2GB, with only 59.8MB committed.
Either the Java runtime is the problem here, or maybe some issue with native resources not being released properly? Like maybe a memory mapped file that isn't getting released?
We do use memory mapped files, so I'll put my focus on those.
EDIT 4:
I think that I've tracked the problem down to an exception that happens as follows:
java.lang.OutOfMemoryError
at java.util.zip.Deflater.init(Native Method)
at java.util.zip.Deflater.<init>(Unknown Source)
at java.util.zip.Deflater.<init>(Unknown Source)
at java.util.zip.DeflaterOutputStream.<init>(Unknown Source)
at java.util.zip.DeflaterOutputStream.<init>(Unknown Source)
at ....
On some very small handful of streams (I have 4 examples now) we are deflating, the above happens. And when it happens, VMMap spikes the heap of the process (not the JVM heap, but the actual native heap) up to 2GB. Once that happens, everything falls apart. This is now very repeatable (running the same stream into the deflater results in the memory spiking)
So, are we maybe looking at a problem with the JRE's zip library? Seems crazy to think that would be it, but I'm really at a loss.
If I take the exact same stream and run it on a different system (even running the same JRE - 32 bit, Java 7u45), we don't get the problem. I have completely uninstalled the JRE and reinstalled it without any change in behavior.