java heap and thread analysis for memory leak

Question

My WebLogic server was configured with 16gb of heap space, but it was 90% used within 1 hour of production usage when most of the users started work. I observed there were several stuck threads whenever this happens.

I have captured the heap dump when the heap was approx 10% free. How do I inspect the heap dump to find out the memory leak, or process, codes which is causing this issue.

I have tried to understand the memory leak, running tools like JMap and Eclipse MAT, but it maybe due to lack of experience, I couldn't understand what these tools are trying to show. Or how/what should I look out for?

I have both the before/after GC heap dump to analyze.

I have reviewed the thread dumps, there were no "waiting to lock" objects threads, the threads were similar as shown below, with threads stuck with no obvious reasons.

You should take several thread dumps to see exactly what ExecuteThread '0' is doing and if it is blocked on the JSP (goto.jsp) It can be the root cause of your memory leak. Do not consider ExecuteThread '3' which is blocked because it is a socket muxer thread. — Emmanuel Collin, Jun 02 '16 at 17:09
did you checked the list of loaded classes for each instance of **ChangeAwareClassLoader** ? — kevin ternet, Jun 04 '16 at 16:53
@kevin ternet, I did go through the classes for ChangeAwareClassLoader how do I tell if something is unusual? — ilovetolearn, Jun 05 '16 at 00:50
@optimus if heap dump was able to sort out 16GB heap down to 227MB of _live objects_, it is _very unlikely_ you have a memory leak — Tair, Jun 05 '16 at 07:30
@optimus if that are the only flags you have.. your problem may be the GC algorithm in effect. Probably your memory fills up, then GC kicks in and stops all the threads. 16GB of memory is worth ~16 seconds of GC pause. I would recommend studying the topic more, or just google for the most recommended CMS settings for web apps — Tair, Jun 05 '16 at 15:16
@optimus, if you are looking to change GC algorithm, have a look at G1GC: http://stackoverflow.com/questions/8111310/java-7-jdk-7-garbage-collection-and-documentation-on-g1/34254605?s=5|0.0000#34254605 — Ravindra babu, Jun 06 '16 at 18:50
@kevin ternet yes, it seems to be loading quite a number of classes from a specific package/module. — ilovetolearn, Jun 08 '16 at 10:24
@Emmanuel Collin I took several thread dump and couldnt find more details on the locks. But, using Eclipse MAT does tell me that the memory consumption seems to have something to do with this file. — ilovetolearn, Jun 08 '16 at 10:25

Nicolas Filotto · Accepted Answer · 2019-12-03T08:10:36.797

3

According to your heap dump, your biggest memory issue is the int arrays, indeed it takes nearly 70 % of your heap (Yes sort the Size Column instead).

Select it in your heap dump, right click and select on Show in Instances View
Then browse the biggest objects and for each of them right click and select Show Nearest GC Root to see which Object has still an hard reference to the int array which prevents to be eligible for the GC.

It could help you to find your memory leak assuming that it is a memory leak.

See below an example of Nearest GC Root allowing to identify a leak that I added intentionally to my program just to show the idea. As you can see in the screenshot, I have an array of int which cannot be eligible for the GC because it is stored in an HashMap called leak in my class Application, so I know that my memory issue could be due to this particular HashMap especially if I have many other objects which lead to this HashMap.

NB: Be patient when you try to identify a leak as it is not always obvious, the ideal situation is where you have a huge object that takes the whole heap but obviously it is not your case there is nothing really obvious that is the reason why I propose to investigate the int arrays first. Don't forget that it could also be little int arrays but thousands of them with the same Nearest GC Root.

Another trick, If you have JProfiler you can simply follow this wonderful tutorial to find your leak.

Response Update:

One simple way to better identify the root cause of the memory leak is to take at least 2 heap dumps then compare them using a tool like jhat with the syntax

jhat -J-Xmx2G -baseline ${path-to-the-first-heap-dump} ${path-to-the-second-heap-dump}

It will launch a small HTTP sever on port 7000 so:

Launch http://localhost:7000/
Then click on Show instance counts for all classes (including platform)

You will then see the list of Classes ordered by total amount of new instances created. You can then use VisualVM to do what I described in the first part of my answer to find the root cause of your memory leak.

You can also use jhat

By selecting of the Top Classes then for each of them
click on one "Reference to this Object"
then click on Exclude weak refs

You will then see the GC root of each instances like the next screenshot:

Another way is to use Eclipse Memory Analyzer also called MAT.

Open the second snapshot with it
Select the view histogram
Then for each of the Top Classes right click
Choose Merge Shortest Paths To GC Roots/ Exclude All references

you will then see something like the next screenshot:

edited Dec 03 '19 at 08:10

answered Jun 06 '16 at 19:41

Nicolas Filotto

43,537
11
94
122

I have uploaded my instance view of int http://imgur.com/IAKt5Zm there is no reference – ilovetolearn Jun 07 '16 at 00:27
keep going with other int arrays, select each of them, then right click show nearest gc root try to find several int arrays with the same gc root – Nicolas Filotto Jun 07 '16 at 07:37
@Nicholas Filotto almost every int array has different GC roots and most of them are empty. – ilovetolearn Jun 08 '16 at 01:36
I opened the heap dump and found char[] as the biggest dominator but for visualjvm is int[] – ilovetolearn Jun 09 '16 at 02:17
I have looked at the codes implemented was using Threading. Could it be the causing ChangeAwareClassLoader to store so many instance of the int[]? Based on this https://stackoverflow.com/questions/6470651/creating-a-memory-leak-with-java?rq=1 – ilovetolearn Jun 09 '16 at 07:12
I'm wondering if it is really a memory leak, do you get the trend that I describe in this answer http://stackoverflow.com/a/37685111/1997376? – Nicolas Filotto Jun 09 '16 at 07:36
my GC pattern looks like http://imgur.com/SdasHkW , http://imgur.com/AWpsqrJ and http://imgur.com/4o6NOmC it doesnt seems that I have a memory leak – ilovetolearn Jun 10 '16 at 02:14
yes indeed it is a normal trend even if it can be surprising – Nicolas Filotto Jun 10 '16 at 06:33
I have a initial heap of 16gb with 256mb new size. Now, I have increased the new size to 2gb and it seems the pargen space is less than 5% used. Could it be due to the allocation of the new size? – ilovetolearn Jun 10 '16 at 06:51
let's be clear with terminology first, by "new size" you mean Young Generation Size? and by "pargen space" you mean Permanent Generation? – Nicolas Filotto Jun 10 '16 at 08:09
sorry, I was referring to Young Generation Space and Old Generation Space. – ilovetolearn Jun 10 '16 at 09:05
which JDK do you use and what are the exact JVM parameters did you set? – Nicolas Filotto Jun 10 '16 at 09:33
I used the following configuration: -Xms16g -Xmx16g -XX:NewSize=2g -XX:MaxNewSize=2g -XX:PermSize=512m -XX:MaxPermSize=1g – ilovetolearn Jun 11 '16 at 07:39
I am using Oracle JDK 6 – ilovetolearn Jun 11 '16 at 07:40
I was using -Xms16g -Xmx16g -XX:NewSize=256M -XX:MaxNewSize=256M previously and the heap used space often reached 80-90%. After switching to -Xms16g -Xmx16g -XX:NewSize=2g -XX:MaxNewSize=2g, the heap used space is only 30-40%. What is going on? – ilovetolearn Jun 11 '16 at 07:48
frankly speaking I don't know, I guess you can post a new question for that – Nicolas Filotto Jun 11 '16 at 10:19
Hi Nicolas, thanks for your guidance on tracing the memory leaks. It has helped me in the troubleshooting =) – ilovetolearn Jun 11 '16 at 10:37
very happy to here that – Nicolas Filotto Jun 11 '16 at 10:37

Erik Ostermueller · Answer 2 · 2016-12-11T07:47:17.230

The JDK's "jmap -histo" command will dump object counts/bytes for all classes to a text file. If you capture/compare a few of these dumps over time, you will see which ones grow continually -- your memory leak. The overhead of -histo is much lower than that of capturing a full heap dump.

Comparing just a few dumps (like the python script detailed here) seems like too small of a sample, so I wrote an open-source tool (here) that runs this jmap -histo command in the background (at an interval). It has a live display and tracks the % of time that the byte count for each class is on the rise.

score 1 · Answer 3 · answered Jun 02 '16 at 16:56

1

It seems you, probably, have a memory leak situation. Your best approach is to use Java Mission Control with Flight Recorder to get the class and method leaking.

You should setup your weblogic managed server with the following parameters:

-Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.port=8999 
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false 
-XX:+UnlockCommercialFeatures 
-XX:+FlightRecorder

When you set this up, follow the instructions here to detect the leak.

Hope it helps !!

answered Jun 02 '16 at 16:56

Ilan Salviano

84
4

my JDK seems to have problem understanding -XX:+UnlockCommercialFeatures -XX:+FlightRecorder – ilovetolearn Jun 04 '16 at 11:13
That's correct. My mistake !! Those parameters are legacy from old JRockit JVM. You can remove it from your startup arguments. No problem on that. – Ilan Salviano Jun 04 '16 at 13:17
I was using JDK 6 and thats why I couldnt enable JFR. I have upgraded to JDK 7. – ilovetolearn Jun 05 '16 at 00:49
I have enabled JFR and how do I find anomalies? – ilovetolearn Jun 05 '16 at 00:53
Your best move is to follow instructions described [here](https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks001.html). Basically, you should start your application server and let JFR record until it throws the OutOfMemory Error. With this record, you should take a look at the Allocation Tab and filter for your package structure just to check the objects being allocated and their amount of Samples an space taken. This should give you a clue of what object is bursting it's boundaries. – Ilan Salviano Jun 06 '16 at 12:29
@Iian Salviano I had not encountered out of memory in my production environment because we manually GC the heap space. However, I am trying to track memory leaks via our test environment. – ilovetolearn Jun 07 '16 at 02:00
Hi @optimus , were you able to test it in test env. ? – Ilan Salviano Jun 08 '16 at 12:53
I am not able to tell any difference. All activities seems to be normal to me in the test environment. Am I lacking some load to trigger some enormous memory consumption? – ilovetolearn Jun 09 '16 at 06:21
Probably yes. That's why JFR is a good option for you, since you can plug into production while is running and recording everything. Overhead on CPU for this runs around 3%.. – Ilan Salviano Jun 09 '16 at 12:31

score 1 · Answer 4 · answered Jun 04 '16 at 13:46

1

I am one of the developers of the tool called Plumbr. Among other things we make an automatic analysis of heap contents in case of excessive memory usage. You may find it useful.

answered Jun 04 '16 at 13:46

Nikem

5,716
3
32
59

I would like to use "free" tools instead. – ilovetolearn Jun 05 '16 at 00:48
1

Totally understandable :) But if this is once-off problem - take our free trial for 14 days and solve your problem with no strings attached. – Nikem Jun 05 '16 at 06:12
2

thanks for the offer, but I would need to probably learn the skills rather than to be dependent on the tool to look for memory leaks. – ilovetolearn Jun 05 '16 at 11:00
Plumblr, no longer exists. – roronoa_zoro Oct 13 '22 at 18:35
@roronoa_zoro yep, we got bought out :) – Nikem Oct 14 '22 at 19:19

score 1 · Answer 5 · answered Jun 05 '16 at 15:36

1

Per your comments: you have Java 7 with 16GB heap, no GC algorithm explicitly specified, so default for Java 7 is Throughput GC, which is not suitable for most web apps, for it leads to long GC pauses for big heaps.

Switch to ConcurrentMarkSweep GC, this way GC will not wait till your memory fills up and will try its best to collect garbage incrementally, so that you will have fewer Stop The World pauses.

answered Jun 05 '16 at 15:36

Tair

3,779
2
20
33

is ConcurrentMarkSweep GC available in JDK 6? – ilovetolearn Jun 08 '16 at 10:24
@optimus yes, I remember using it even on JDK 5 – Tair Jun 08 '16 at 10:26
cool. I will read into it and propose to my product manager to use CMS GC instead of the default parallel GC – ilovetolearn Jun 08 '16 at 11:10

score 1 · Answer 6 · answered Jun 08 '16 at 06:50

1

Did you try yourkit profiler? It's not free, but you can evaluate it for 30 days. In this case if you dump contains all object (not only live), you will be able to check roots for them as well. Because it could be that you don't have memory leak, but too big memory footprint. Also it would be great to enable GC logs and parse how much FullGC pauses do you have:

grep "Full GC" jvm_gc.log | wc -l

In ideal world it should be 0 :)

Btw, whole this article could be helpful for you.

answered Jun 08 '16 at 06:50

Jimilian

3,859
30
33

how do I tell the difference between a memory leak vs big memory foot print? And the codes which are consuming lots of memory? – ilovetolearn Jun 08 '16 at 09:23
@optimus, memory leak - you don't cleanup it all. It means that these objects will be still reachable - so, all you need just to find a root. Big footprint - you have a lot of same object, but they a already unreachable from root. And you need to find same object in reachable scope - to find a root. – Jimilian Jun 08 '16 at 09:54
@optimus, when you found root in yourkit, I need to click on "QuickInfo" to see stacktrace there this object was created. – Jimilian Jun 09 '16 at 06:37
thanks on the recommendation for yourkit. I am using MAT at the moment. – ilovetolearn Jun 09 '16 at 07:14
@optimus it means that you don't have big problems with memory. Or leaks, at least. – Jimilian Jun 10 '16 at 09:11
I was using -Xms16g -Xmx16g -XX:NewSize=256M -XX:MaxNewSize=256M previously and the heap used space often reached 80-90%. After switching to -Xms16g -Xmx16g -XX:NewSize=2g -XX:MaxNewSize=2g, the heap used space is only 30-40%. What is going on? – ilovetolearn Jun 11 '16 at 07:48
@optimus It's much cheaper to cleanup young generation than old. So, if you have a lot of object with short life cycle, it's better to clean them as early as possible. – Jimilian Jun 11 '16 at 09:16

java heap and thread analysis for memory leak

6 Answers6

Linked