5

I am using 6.0.20 I have a number of web apps running on the server, over time, approximately 3 days and the server needs restarting otherwise the server crashes and becomes unresponsive.

I have the following settings for the JVM:

-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=c:\tomcat\Websites\private\mydomain\apache-tomcat-6.0.20\logs

This provides me with a hprof file which I have loaded using Java VisualVM which identifies the following:

byte[] 37,206   Instances | Size 86,508,978
int[] 540,909   Instances | Size 55,130,332
char[] 357,847  Instances | Size 41,690,928

The list goes on, but how do I determine what is causing these issues?

I am using New Relic to monitor the JVM and only one error seems to appear but it's a reoccurring one, org.apache.catalina.connector. ClientAbortException. Is it possible that when a user session is aborted, any database connections or variables created are not being closed and are therefore left orphaned?

There is a function which is used quite heavily throughout each web app, not sure if this has any bearing on the leak:

public static String replaceCharacters(String s)
{
    s = s.replaceAll("  ", " ");
    s = s.replaceAll(" ", "_");
    s = s.replaceAll("\351", "e");
    s = s.replaceAll("/", "");
    s = s.replaceAll("--", "-");
    s = s.replaceAll("&", "and");
    s = s.replaceAll("&", "and");
    s = s.replaceAll("__", "_");
    s = s.replaceAll("\\(", "");
    s = s.replaceAll("\\)", "");
    s = s.replaceAll(",", "");
    s = s.replaceAll(":", "");
    s = s.replaceAll("\374", "u");
    s = s.replaceAll("-", "_");
    s = s.replaceAll("\\+", "and");
    s = s.replaceAll("\"", "");
    s = s.replaceAll("\\[", "");
    s = s.replaceAll("\\]", "");
    s = s.replaceAll("\\*", "");
    return s;
}

Is it possible that when a user connection is aborted, such as a user browser closed or the users has left the site that all variables, connections, etc... are purged/released, but isn't GC supposed to handled that?

Below are my JVM settings:

-Dcatalina.base=c:\tomcat\Websites\private\mydomain\apache-tomcat-6.0.20
-Dcatalina.home=c:\tomcat\Websites\private\mydomain\apache-tomcat-6.0.20
-Djava.endorsed.dirs=c:\tomcat\Websites\private\mydomain\apache-tomcat-6.0.20\endorsed
-Djava.io.tmpdir=c:\tomcat\Websites\private\mydomain\apache-tomcat-6.0.20\temp
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Djava.util.logging.config.file=c:\tomcat\Websites\private\mydomain\apache-tomcat-6.0.20\conf\logging.properties
-Dfile.encoding=UTF-8
-Dsun.jnu.encoding=UTF-8
-javaagent:c:\tomcat\Websites\private\mydomain\apache-tomcat-6.0.20\newrelic\newrelic.jar
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=c:\tomcat\Websites\private\mydomain\apache-tomcat-6.0.20\logs
-Dcom.sun.management.jmxremote.port=8086
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false vfprintf
-Xms1024m
-Xmx1536m

Am I missing anything? The server has 3GB ram.

Any help would be much appreciated :-)

iggyweb
  • 2,373
  • 12
  • 47
  • 77
  • No easy way really. You can try doing a dump and then called the garbage collector then doing another dump and seeing what's hanging around. Classic memory leaks are caused by `ThreadLocal` and bad use of `static` caches. Are your apps under load or idle? – Boris the Spider Jul 29 '13 at 16:13
  • This is a live production environment, each app uses a class which has the following functions: public static String removeLineBreaks(String s) public static String replace(String s, String s1, String s2) public static String replaceCharacters(String s) – iggyweb Jul 30 '13 at 08:01
  • 1
    Further investigation using Eclipse Memory Analyzer, the two biggest issues are org.apache.catalina.loader.WebappClassLoader and org.apache.naming.resources.ResourceCache, as I am using Tomcat 6.0.20, I believe WebappClassLoader is an issue that wasn't resolved till Tomcat 7. – iggyweb Jul 31 '13 at 09:37
  • If you're not deploying/undeploying loads of times then the ClassLoader is **not** your problem. This is big because it references all your classes. If you are deploying/undeploying then you may have a ClassLoader leak; bad new is that these are incredibly hard to track down. – Boris the Spider Jul 31 '13 at 09:38
  • 1
    After changing the JDBC driver, things have settled down, noticed that classes are unloading, total unloaded has increased to 117 from 68 earlier this morning. Heap still fluctuating between 300 and 600MB approx. 8 times a minute. For 23 websites and 8 web apps I'm guessing that's not too bad. – iggyweb Jul 31 '13 at 10:10

3 Answers3

2

... but how do I determine what is causing these issues?

You need to use a dump analyser that allows you to see what is making these objects reachable. Pick an object, and see what other object or objects refer to it ... and work backwards through the chains until you find either a "GC root" or or some application-specific class that you recognise.

Here are a couple of references on analysing memory snapshots and memory profilers:

Once you have identified that, you've gone most of the way to identifying the source of your storage leak.


That function has no direct bearing on the leak. It certainly won't cause it. (It could generate a lot of garbage String objects ... but that's a different issue.)

Community
  • 1
  • 1
Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • I am using Java VisualVM to view the heap dump. If I select int[] there are 534,335 instances, if I select one of those instances it says there are <500 instances> with a value and if I select one of those instances I get no other information. Am I doing something wrong? – iggyweb Jul 30 '13 at 10:13
  • I have noticed Abandoned connection cleanup threads which relate to com.mysql.jdbc.NonRegisteringDriver$1.run(NonRegisteringDriver.java:93) is this related? – iggyweb Jul 30 '13 at 11:35
  • Think I may have stumbled on something here, I am using mysql-connector-java-5.1.21-bin.jar which I believe has an issue causing Tomcat not to release abandoned threads. Considering using mysql-connector-java-5.1.25-bin.jar instead. – iggyweb Jul 30 '13 at 11:47
  • That is certainly a plausible explanation. Try it and see if changing the JAR helps. It should be a "low risk" change. However, given the number of leaked objects, I suspect that what you've found is not the only significant leak. – Stephen C Jul 30 '13 at 12:02
  • I will start with this, I have noticed that the heap is between 1GB and 1.5GB but the PermGen is between 55MB to 85MB, I think this needs some attention too, would this cause the GC to run pretty much constantly? Do you have a recommendation of size for the PermGen? – iggyweb Jul 30 '13 at 12:40
  • I've removed mysql-connector-java-5.1.21-bin.jar and replaced it with mysql-connector-java-5.1.25-bin.jar, added the lines-XX:PermSize=512m and -XX:MaxPermSize=512m to the JVM settings, just wondering if I need to add -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled -XX:+UseConcMarkSweepGC. – iggyweb Jul 30 '13 at 13:04
  • You don't need to fiddle with permgen unless you get OOME exceptions that say you are out of permgen memory. The permgen levels should be stable unless you are doing hot deploys. **Don't change lots of things at once** ... 'cos if you do, you won't be able to figure out what actually made the difference. Fiddling with the GC parameters will NOT cure a storage leak. Leave them alone. – Stephen C Jul 30 '13 at 14:10
  • *"PermGen is between 55MB to 85MB, ... would this cause the GC to run pretty much constantly?"* Probably not, but it depends on what is actually doing permgen allocations. (Hot deploys?) – Stephen C Jul 30 '13 at 14:13
  • Thank you for your advice, am I better removing the lines -XX:PermSize=512m and -XX:MaxPermSize=512m. I can see the heap fluxtuating between 300 and 600MB, showing that memory is being reclaimed, which is a good thing. Any idea how long before threads are cleaned up, I am showing 6 Abandoned connection cleanup threads, those plus the Attach Listener are showing 18 minutes and climbing? – iggyweb Jul 30 '13 at 14:16
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/34458/discussion-between-iggyweb-and-stephen-c) – iggyweb Jul 30 '13 at 14:28
  • After 16 hours of uptime, I can see Classes Total loaded: 8,118 which is significantly more than yesterday with Total unloaded: 68. The Heap has now risen to an average between 350 and 700MB and the PermGen has risen by 17MB. If I click on a function that generates a PDF, total loaded is now 8,536 and total unloaded is still 68. If I close the PDF window the total loaded and unloaded remain at the elevated levels, should classes not unload when finished executing? – iggyweb Jul 31 '13 at 07:38
  • Classes will be unloaded when they become unreachable. But that only happens when the relevant class loader becomes unreachable. If this is your problem, you need to read up on classloader memory leaks. – Stephen C Jan 20 '14 at 22:21
2

I migrated all projects to Tomcat 7.0.42 and my errors have disappeared, our websites are far more stable and slightly faster, we are using less memory and cpu usage is far better.

iggyweb
  • 2,373
  • 12
  • 47
  • 77
0

Start server in local dev environment, attach profiler (yourkit preferably), Take the heap dump periodically, You will see growth in object byte[] and you can actually connect those byte[] with your application class leaking it with this tool that will help you idenfity defect in code

jmj
  • 237,923
  • 42
  • 401
  • 438
  • Using Eclipse Memory Analyzer I can see the byte[] instances which are due to ClientAbortExceptions, the user has either left the web page or site before the page has fully loaded and therefore images haven't fully downloaded. – iggyweb Jul 31 '13 at 12:15
  • Do you mean fix the ClientAbortException? Do I have to extend each function to do so or can it be done by declaring a class in the server.xml file to handle severed requests globally? – iggyweb Aug 01 '13 at 10:50