15

We have an application that is widely deployed (several hundred workstations running it). At one site (and only one site - our product is widely deployed to many environments), we randomly get the following error:

java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Unknown Source)

Operating system is Windows 7 64 bit We are running in a 32 bit JVM ( 1.7.0_45)

Using Windows Task Manager, I can see that the process has 39 native threads (not very many), so we don't have a thread leak in our app... There are no other processes consuming lots of threads (Explorer has 35, jvisualvm has 24, iexplore has 20, ... I don't have an exact count, but we are probably looking at maybe 300 threads for the user total).

I have attempted to attach JVisualVM, but it fails to connect to the process (probably b/c of thread exhaustion). But from the metrics I can obtain from JVisualVM, the number of Java threads is about 22 live and 11 daemon.

The heap is well behaved - heap is 500MB with 250MB actually used.

The process is launched with -Xmx512m

Our process is showing Memory usage (in Task Manager) of 597,744K.

The workstation has 8GB RAM, of which only 3.8-4.0GB are used (I know, a 32 bit process won't access all of that, but there's still plenty)

Used VMMap, and the stack is 49,920KB size with 2,284K committed.

The process shows 5358KB free, and the largest allocatable block in the free list is 1,024K in size.

I used Resource Monitor and it's showing the Commit (KB) to be 630428, working set (KB) is 676,996, Shareable (KB) is 79,252 and the Private (KB) is 597,744

I am at a complete loss as to what is going on here. I've read a ton of articles on this, and it sounds like on some Linux systems, there is a per-user thread limit that can cause problems (but this is not Linux, and the problems described in other articles usually talk about needing thousands of threads - definitely not our case here).

If our heap was really big, I could see that eating into space available for threads, but 500MB seems like a very reasonable and small heap (esp for a workstation with 8GB RAM).

So I've pretty much exhausted everything I know to do - does anyone have any additional pointers about what might be going on here?

EDIT 1:

I found this interesting article: Eclipse crashes with "Unable to create new native thread" - any ideas? (my settings and info inside)

They are suggesting that stack size could be the problem.

This article: where to find default XSS value for Sun/Oracle JVM? - gives a link to Oracle documentation saying that default stack size is 512KB. So if my app has about 40 threads, we are looking at 20 MB of stack. 500MB heap. This all seems to be well within normal bounds for a 32 bit Java process.

So that leaves me with two possibilities that I can think of:

  1. Some transient condition is causing a huge number of threads to be created (but those threads are discarded before we have a chance to do diagnostics)
  2. Memory segmentation is killing us for some reason. It is interesting that the largest allocatable block (per VMMap is 1MB) - that doesn't seem like very much... On another machine where things are working fine, the largest allocatable block is 470MB...

So, are there any pointers about how to check for memory segmentation?

EDIT 2:

Article linked to by @mikhael ( http://blog.egilh.com/2006/06/2811aspx.html ) gives some rough calculations for allowed # of threads on 32 bit JVM.

I'm going to assume:

OS process space limit: 2GB Modern JVM requires 250MB (this is a big assumption - I just doubled what was in the linked article) Stack size (default Oracle): 512KB Heap: 512MB PermGen: (can't remember exactly, but it was certainly less than 100MB, so let's just use that)

So I have a worst case scenario of: (2GB - .25GB - .5GB - .1GB)/.005GB = 230 threads

EDIT 3:

Info I should have included originally: The application runs fine for a good while (like 24 to 48 hours) before this problem happens. The application does continuous background processing, so has very little idle time. Not sure if that's important or not...

EDIT 4:

More info: Looking at VMMap from another failure, and I'm seeing native heap exhaustion.

The Heap size is 1.2GB, with only 59.8MB committed.

Either the Java runtime is the problem here, or maybe some issue with native resources not being released properly? Like maybe a memory mapped file that isn't getting released?

We do use memory mapped files, so I'll put my focus on those.

EDIT 4:

I think that I've tracked the problem down to an exception that happens as follows:

java.lang.OutOfMemoryError
    at java.util.zip.Deflater.init(Native Method)
    at java.util.zip.Deflater.<init>(Unknown Source)
    at java.util.zip.Deflater.<init>(Unknown Source)
    at java.util.zip.DeflaterOutputStream.<init>(Unknown Source)
    at java.util.zip.DeflaterOutputStream.<init>(Unknown Source)
    at ....

On some very small handful of streams (I have 4 examples now) we are deflating, the above happens. And when it happens, VMMap spikes the heap of the process (not the JVM heap, but the actual native heap) up to 2GB. Once that happens, everything falls apart. This is now very repeatable (running the same stream into the deflater results in the memory spiking)

So, are we maybe looking at a problem with the JRE's zip library? Seems crazy to think that would be it, but I'm really at a loss.

If I take the exact same stream and run it on a different system (even running the same JRE - 32 bit, Java 7u45), we don't get the problem. I have completely uninstalled the JRE and reinstalled it without any change in behavior.

Community
  • 1
  • 1
Kevin Day
  • 16,067
  • 8
  • 44
  • 68
  • You say "site". Is that many identical machines at one physical location that all show this behavior? – Thorbjørn Ravn Andersen Nov 13 '13 at 23:57
  • Could you share all the jvm parameters? The Xmx parameter does not have much effect regarding threads creation. – benjamin.d Nov 13 '13 at 23:58
  • You sure about your thread count? Any chance it is bursting above your thread counts quickly so your tools don't see it? – Gray Nov 13 '13 at 23:59
  • Also, try removing any -X options! The defaults should work fine. – Thorbjørn Ravn Andersen Nov 14 '13 at 00:00
  • see if this link helps: http://javaeesupportpatterns.blogspot.de/2012/09/outofmemoryerror-unable-to-create-new.html – Nitin Dandriyal Nov 14 '13 at 00:03
  • IS there some other process in that particular machine that might have memory spikes? – Nitin Dandriyal Nov 14 '13 at 00:12
  • 2
    You don't have a thread problem. You have a memory use problem. The error message simply says that you have run out of memory therefore, we cannot do what you request (create another thread.) Sometimes you can put a sleep() at the beginning of your code and then attach the Monitor. That way it gets going before the memory is exhausted. – edharned Nov 14 '13 at 14:56
  • @ThorbjørnRavnAndersen I have added a clarification on what I mean by 'site' - our app is widely deployed into many different environments. I have one machine at one physical location with the issue, no other machines at that location run the application. – Kevin Day Nov 14 '13 at 20:29
  • You could use [Yourkit](http://www.yourkit.com/), and use the attached profiler to take a [memory snapshot](http://www.yourkit.com/docs/80/help/out_of_memory.jsp) once an `OutOfMemoryException` occurs. I don't want to advertise that product, as it is commercial but it's the best java profiler i worked with so far. It even has a 30 day eval licence, i would try that out. – Ortwin Angermeier Nov 14 '13 at 23:14
  • @benjamin.d that's the only JVM parameter that we are passing in. Everything else is default. – Kevin Day Nov 14 '13 at 23:59
  • @Mikhail thanks for that link. I don't think this is the same. We aren't in a data center or multi-tenant system so we have full control over the JVM being used. We aren't leaking threads (at least jvisualvm and task manager show that we aren't). Memory usage is well behaved (500 MB heap, about 50% used). Also, this app is running at 300 different sites, and this is the only place the problem is happening. If this is a transient problem with the heap, can heap exhaustion cause native thread creation to fail? – Kevin Day Nov 15 '13 at 12:59
  • http://blog.egilh.com/2006/06/2811aspx.html - this article suggests, that threads are allocated no on the heap, but between heap and Xmx parameter. This could be an issue. – Mikhail Nov 15 '13 at 13:13
  • @Mikhail good article, and yes, I've been looking at that. However, I'm not creating lots and lots of threads (unless 40 is a lot). I don't have the permgen size, but I remember that it was quite modest. I'll add an additional edit to the question with some back of the envelope calculations based on the article you linked to. – Kevin Day Nov 15 '13 at 13:25
  • At the risk of being too obvious and sounding silly, have you tried increasing the -Xmx parameter to eg 1024Mb? – TT. Nov 15 '13 at 13:52
  • @TT. no, have not increased -Xmx... given the nature of the problem, it seems like increasing heap is the last thing we would want to do here. – Kevin Day Nov 15 '13 at 15:28
  • I'm not an expert but I sometimes see funny things going on with 32 bit apps running on a 64 bit Win. Can you check that the error also appears if you run it in a 64 bit JVM? – NoDataDumpNoContribution Nov 15 '13 at 16:20
  • @Trilarion unfortunately, no, I can't run in a 64 bit JVM - we use native libraries that are 32 bit. Hmm - that actually raises an interesting question - I wonder if one of the native libraries might be leaking... – Kevin Day Nov 15 '13 at 19:54

2 Answers2

4

Finally figured this out.

There were a couple of data streams that we processed (4 out of 10 million at this site) that wound up creating a ton of DeflaterOutputStream objects. A 3rd party library we were using was calling finish() on the stream instead of close(). The underlying Deflater finalizer was cleaning things up, so as long as the load wasn't too high, there were no problems. But past a tipping point, we started running into this:

http://jira.pentaho.com/browse/PRD-3211

which led us to this:

http://bugs.sun.com/view_bug.do?bug_id=4797189

Several hours after that happened, the system finally got itself into a corner that it couldn't get out of and was unable to create a native thread when we needed.

The fix was to get the 3rd party library to close the DeflaterOutputStream.

So definitely a native resource leak. If anyone else is ever hitting something like this, the VMMap tool was indispensable for eventually tracking down which data streams were causing the problem.

Kevin Day
  • 16,067
  • 8
  • 44
  • 68
0

I suspect, though it is clearly difficult to prove, that you are running into a 32bit memory allocation problem.

Threads are allocated native memory, not heap memory which has to be contiguous, in order to run. Whilst I am sure that WOW64 allows 32 bit processes to operate in the region above 4gb I'm not so sure about allocating native memory for a new thread above the 4gb limit should the intervening space be used.

Hence your application and the heap are in lowish mem, other processes are taking the intervening 3.07gigs ( if memory serves )and then attempting to allocate a native memory block 4gb above the initial caller in order to create a new thread.

Could you confirm that this issue only occurs when memory use is around or above the 4gb mark?

Chaffers
  • 176
  • 9