3

My WebLogic server was configured with 16gb of heap space, but it was 90% used within 1 hour of production usage when most of the users started work. I observed there were several stuck threads whenever this happens.

I have captured the heap dump when the heap was approx 10% free. How do I inspect the heap dump to find out the memory leak, or process, codes which is causing this issue.

I have tried to understand the memory leak, running tools like JMap and Eclipse MAT, but it maybe due to lack of experience, I couldn't understand what these tools are trying to show. Or how/what should I look out for?

I have both the before/after GC heap dump to analyze.

I have reviewed the thread dumps, there were no "waiting to lock" objects threads, the threads were similar as shown below, with threads stuck with no obvious reasons.

ilovetolearn
  • 2,006
  • 5
  • 33
  • 64
  • You should take several thread dumps to see exactly what ExecuteThread '0' is doing and if it is blocked on the JSP (goto.jsp) It can be the root cause of your memory leak. Do not consider ExecuteThread '3' which is blocked because it is a socket muxer thread. – Emmanuel Collin Jun 02 '16 at 17:09
  • did you checked the list of loaded classes for each instance of **ChangeAwareClassLoader** ? – kevin ternet Jun 04 '16 at 16:53
  • 1
    I fail to see how 227MB = 90% of 16GB heap? – Tair Jun 04 '16 at 18:13
  • @tair I did a jmap live heap dump, is that the reason? – ilovetolearn Jun 05 '16 at 00:50
  • @kevin ternet, I did go through the classes for ChangeAwareClassLoader how do I tell if something is unusual? – ilovetolearn Jun 05 '16 at 00:50
  • 3
    @optimus if heap dump was able to sort out 16GB heap down to 227MB of _live objects_, it is _very unlikely_ you have a memory leak – Tair Jun 05 '16 at 07:30
  • @optimus what are your JVM flags? – Tair Jun 05 '16 at 07:35
  • @tair Xmx16g Xms16g and NewSize256m – ilovetolearn Jun 05 '16 at 10:07
  • @optimus if that are the only flags you have.. your problem may be the GC algorithm in effect. Probably your memory fills up, then GC kicks in and stops all the threads. 16GB of memory is worth ~16 seconds of GC pause. I would recommend studying the topic more, or just google for the most recommended CMS settings for web apps – Tair Jun 05 '16 at 15:16
  • @optimus, if you are looking to change GC algorithm, have a look at G1GC: http://stackoverflow.com/questions/8111310/java-7-jdk-7-garbage-collection-and-documentation-on-g1/34254605?s=5|0.0000#34254605 – Ravindra babu Jun 06 '16 at 18:50
  • @kevin ternet yes, it seems to be loading quite a number of classes from a specific package/module. – ilovetolearn Jun 08 '16 at 10:24
  • @Emmanuel Collin I took several thread dump and couldnt find more details on the locks. But, using Eclipse MAT does tell me that the memory consumption seems to have something to do with this file. – ilovetolearn Jun 08 '16 at 10:25

6 Answers6

3

According to your heap dump, your biggest memory issue is the int arrays, indeed it takes nearly 70 % of your heap (Yes sort the Size Column instead).

  1. Select it in your heap dump, right click and select on Show in Instances View
  2. Then browse the biggest objects and for each of them right click and select Show Nearest GC Root to see which Object has still an hard reference to the int array which prevents to be eligible for the GC.

It could help you to find your memory leak assuming that it is a memory leak.

See below an example of Nearest GC Root allowing to identify a leak that I added intentionally to my program just to show the idea. As you can see in the screenshot, I have an array of int which cannot be eligible for the GC because it is stored in an HashMap called leak in my class Application, so I know that my memory issue could be due to this particular HashMap especially if I have many other objects which lead to this HashMap.

enter image description here

NB: Be patient when you try to identify a leak as it is not always obvious, the ideal situation is where you have a huge object that takes the whole heap but obviously it is not your case there is nothing really obvious that is the reason why I propose to investigate the int arrays first. Don't forget that it could also be little int arrays but thousands of them with the same Nearest GC Root.

Another trick, If you have JProfiler you can simply follow this wonderful tutorial to find your leak.

Response Update:

One simple way to better identify the root cause of the memory leak is to take at least 2 heap dumps then compare them using a tool like jhat with the syntax

jhat -J-Xmx2G -baseline ${path-to-the-first-heap-dump} ${path-to-the-second-heap-dump}

It will launch a small HTTP sever on port 7000 so:

  1. Launch http://localhost:7000/
  2. Then click on Show instance counts for all classes (including platform)

You will then see the list of Classes ordered by total amount of new instances created. You can then use VisualVM to do what I described in the first part of my answer to find the root cause of your memory leak.

You can also use jhat

  1. By selecting of the Top Classes then for each of them
  2. click on one "Reference to this Object"
  3. then click on Exclude weak refs

You will then see the GC root of each instances like the next screenshot:

enter image description here

Another way is to use Eclipse Memory Analyzer also called MAT.

  1. Open the second snapshot with it
  2. Select the view histogram
  3. Then for each of the Top Classes right click
  4. Choose Merge Shortest Paths To GC Roots/ Exclude All references

you will then see something like the next screenshot:

enter image description here

Nicolas Filotto
  • 43,537
  • 11
  • 94
  • 122
2

The JDK's "jmap -histo" command will dump object counts/bytes for all classes to a text file. If you capture/compare a few of these dumps over time, you will see which ones grow continually -- your memory leak. The overhead of -histo is much lower than that of capturing a full heap dump.

Comparing just a few dumps (like the python script detailed here) seems like too small of a sample, so I wrote an open-source tool (here) that runs this jmap -histo command in the background (at an interval). It has a live display and tracks the % of time that the byte count for each class is on the rise.

1

It seems you, probably, have a memory leak situation. Your best approach is to use Java Mission Control with Flight Recorder to get the class and method leaking.

You should setup your weblogic managed server with the following parameters:

-Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.port=8999 
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false 
-XX:+UnlockCommercialFeatures 
-XX:+FlightRecorder

When you set this up, follow the instructions here to detect the leak.

Hope it helps !!

  • my JDK seems to have problem understanding -XX:+UnlockCommercialFeatures -XX:+FlightRecorder – ilovetolearn Jun 04 '16 at 11:13
  • That's correct. My mistake !! Those parameters are legacy from old JRockit JVM. You can remove it from your startup arguments. No problem on that. – Ilan Salviano Jun 04 '16 at 13:17
  • I was using JDK 6 and thats why I couldnt enable JFR. I have upgraded to JDK 7. – ilovetolearn Jun 05 '16 at 00:49
  • I have enabled JFR and how do I find anomalies? – ilovetolearn Jun 05 '16 at 00:53
  • Your best move is to follow instructions described [here](https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks001.html). Basically, you should start your application server and let JFR record until it throws the OutOfMemory Error. With this record, you should take a look at the Allocation Tab and filter for your package structure just to check the objects being allocated and their amount of Samples an space taken. This should give you a clue of what object is bursting it's boundaries. – Ilan Salviano Jun 06 '16 at 12:29
  • @Iian Salviano I had not encountered out of memory in my production environment because we manually GC the heap space. However, I am trying to track memory leaks via our test environment. – ilovetolearn Jun 07 '16 at 02:00
  • Hi @optimus , were you able to test it in test env. ? – Ilan Salviano Jun 08 '16 at 12:53
  • I am not able to tell any difference. All activities seems to be normal to me in the test environment. Am I lacking some load to trigger some enormous memory consumption? – ilovetolearn Jun 09 '16 at 06:21
  • Probably yes. That's why JFR is a good option for you, since you can plug into production while is running and recording everything. Overhead on CPU for this runs around 3%.. – Ilan Salviano Jun 09 '16 at 12:31
1

I am one of the developers of the tool called Plumbr. Among other things we make an automatic analysis of heap contents in case of excessive memory usage. You may find it useful.

Nikem
  • 5,716
  • 3
  • 32
  • 59
1

Per your comments: you have Java 7 with 16GB heap, no GC algorithm explicitly specified, so default for Java 7 is Throughput GC, which is not suitable for most web apps, for it leads to long GC pauses for big heaps.

Switch to ConcurrentMarkSweep GC, this way GC will not wait till your memory fills up and will try its best to collect garbage incrementally, so that you will have fewer Stop The World pauses.

Tair
  • 3,779
  • 2
  • 20
  • 33
1

Did you try yourkit profiler? It's not free, but you can evaluate it for 30 days. In this case if you dump contains all object (not only live), you will be able to check roots for them as well. Because it could be that you don't have memory leak, but too big memory footprint. Also it would be great to enable GC logs and parse how much FullGC pauses do you have:

grep "Full GC" jvm_gc.log | wc -l

In ideal world it should be 0 :)

Btw, whole this article could be helpful for you.

Jimilian
  • 3,859
  • 30
  • 33
  • how do I tell the difference between a memory leak vs big memory foot print? And the codes which are consuming lots of memory? – ilovetolearn Jun 08 '16 at 09:23
  • @optimus, memory leak - you don't cleanup it all. It means that these objects will be still reachable - so, all you need just to find a root. Big footprint - you have a lot of same object, but they a already unreachable from root. And you need to find same object in reachable scope - to find a root. – Jimilian Jun 08 '16 at 09:54
  • @optimus, when you found root in yourkit, I need to click on "QuickInfo" to see stacktrace there this object was created. – Jimilian Jun 09 '16 at 06:37
  • thanks on the recommendation for yourkit. I am using MAT at the moment. – ilovetolearn Jun 09 '16 at 07:14
  • @optimus it means that you don't have big problems with memory. Or leaks, at least. – Jimilian Jun 10 '16 at 09:11
  • I was using -Xms16g -Xmx16g -XX:NewSize=256M -XX:MaxNewSize=256M previously and the heap used space often reached 80-90%. After switching to -Xms16g -Xmx16g -XX:NewSize=2g -XX:MaxNewSize=2g, the heap used space is only 30-40%. What is going on? – ilovetolearn Jun 11 '16 at 07:48
  • @optimus It's much cheaper to cleanup young generation than old. So, if you have a lot of object with short life cycle, it's better to clean them as early as possible. – Jimilian Jun 11 '16 at 09:16