1

System Details:

  • Red Hat Enterprise Linux 6.3
  • Sun Java JDK 1.6.0_38
  • Apache Tomcat 7.0.34
  • VMWare Environment

Me: Medium Experienced Linux Admin, Trial By Fire Java/Tomcat Admin, almost no understand of Java programming/code.

The environmental variables I've set are the results of months of digging through stackoverflow posts, oracle docs, and 300 or so other web posts, much of it with conflicting information or details about a different version of java VM.

Problem Sometimes, Tenured Generation (also called Old) is not emptying/garbage collecting. This causes slow performance for our end users. Initiating a Garbage Collection from Visual VM sometimes resolves the issue, but not often. I am usually able to take a heap dump, but unfortunately I don't know what I'm looking for. What information should I be looking for? I'm sure there are problems with the code, but "fixing the code" is not an immediate option. If it's as simple as throwing more memory/money at it, I'll do that, but I'd like Java to just empty the darn trash. If more details are needed, I'm happy to provide them, I may have to censor some content as there is 'some' private information.

setenv.sh contents

export JAVA_HOME="/usr/java/jdk1.6.0_38"
export CATALINA_OPTS="$CATALINA_OPTS -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails"
export CATALINA_OPTS="$CATALINA_OPTS -Xms512m -Xmx512m"
export CATALINA_OPTS="$CATALINA_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
export CATALINA_OPTS="$CATALINA_OPTS -XX:+CMSClassUnloadingEnabled"
export CATALINA_OPTS="$CATALINA_OPTS -XX:CMSInitiatingOccupancyFraction=75"
export CATALINA_OPTS="$CATALINA_OPTS -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"
export CATALINA_OPTS="$CATALINA_OPTS -server"
export CATALINA_OPTS="$CATALINA_OPTS -Dcom.sun.management.jmxremote.port=8008"
export CATALINA_OPTS="$CATALINA_OPTS -Dcom.sun.management.jmxremote.ssl=false"
export CATALINA_OPTS="$CATALINA_OPTS -Dcom.sun.management.jmxremote.authenticate=true"
export CATALINA_OPTS="$CATALINA_OPTS -Dcom.sun.management.jmxremote.password.file=/home/otrack/tomcat7/conf/management.jmxremote.password"
export CATALINA_OPTS="$CATALINA_OPTS -Djava.rmi.server.hostname=10.47.10.113"
export JPDA_OPTS="-agentlib:jdwp=transport=dt_socket,address=8000,server=y,suspend=n"
tshepang
  • 12,111
  • 21
  • 91
  • 136
Ross H
  • 11
  • 3
  • When it does garbage collect, how much memory does it recover? – Lee Meador Feb 05 '13 at 18:53
  • Depends, Right now one of them (there are four identical) is at 483M, and just GC'd down to 325M. It has smaller and larger GCs. – Ross H Feb 05 '13 at 18:55
  • Is VMWare set to move the memory available up and down? Are there plenty of CPUs provided? – Lee Meador Feb 05 '13 at 19:04
  • VMWare doesn't change the physical memory available to the operating system, it allocates 896 to the machine. Machine boots, uses memory for OS, daemons, etc. JavaVM starts and java, being java, takes all the memory it's assigned and says "MINE". Anything left over, VMWare borrows and gives back as needed. Lots of CPU horse power available from the VMBackend, dual 6 core opterons, but the OS is configured for 2 CPU cores. Plenty of horse power. – Ross H Feb 05 '13 at 19:16
  • Did you read this one? http://stackoverflow.com/questions/9697754/cms-garbage-collector-when-does-it-run is where they mention that CMSInitiatingOccupancyFraction only applies the 1st GC unless you also set +UseCMSInitiatingOccupancyOnly too. – Lee Meador Feb 06 '13 at 00:02
  • Also those memory sizes you gave me show its ignoring the 70% threshold but I think that gets modified if GC takes longer than it thinks it should. (But that would cause it to kick off more often which isn't happening for you.) – Lee Meador Feb 06 '13 at 00:03
  • 1
    Fantastic information, thank you. Next time one of the systems goes down, I will implement that change and report back with results. Is it just me or does that additional flag have a strange name considering what it does? – Ross H Feb 06 '13 at 16:04
  • So, I had high hopes that this would resolve our problem, but it did not. I added -XX:+UseCMSInitiatingOccupancyOnly this is showing as active JVM arguments in VisualVM, but our tenured is full and we're now CPU thrashing as it's doing lots of, what I assume are small Garbage Collections. I definitely appreciate all the prior help and all future help. – Ross H Feb 07 '13 at 21:37
  • What does increasing the -Xmx memory do? Does it just wait longer to start thrashing or does it thrash more slowly as GC slows down or what? – Lee Meador Feb 07 '13 at 21:59
  • I've increased the -Xms & -Xmx from 512M to 768M to test this. I was told that we should be aiming to keep them the same so we don't spend any time/resources "growing" the heap except at initial start (when we don't have actual use). I additionally lowered the PermSize & MaxPermSize from 128M to 90M as we weren't passing 80M and have GCing of perm enabled. I'll report back to see if this fixes the problem. – Ross H Feb 08 '13 at 18:32
  • Additionally, we deployed a code push that removed synchronization on multiple objects that we believe may have been causing some thread locks. From what I'm told, these locked threads may cause additional memory being eaten, however, shouldn't they have been GC'd anyway if them/their *influence* ended up in Tenured. – Ross H Feb 08 '13 at 18:34
  • We shall see what happens. – Lee Meador Feb 08 '13 at 23:09
  • So we were fine on tenured memory all of Friday & Saturday. Sunday afternoon I swung by the office to check in. Tenured memory on one of the four systems was maxed out, as was Eden (new) and Survivor 1 CPU was thrashing back and forth doing garbage collections (it says). Will re-post the current config because I've made some tweaks as recommended since the first post. – Ross H Feb 11 '13 at 14:45
  • Drat, the interface is not great for posting config files. – Ross H Feb 11 '13 at 14:48
  • -Xms768m -Xmx768m -XX:PermSize=90m -XX:MaxPermSize=90m -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=65 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode – Ross H Feb 11 '13 at 14:51
  • So not being a coder and only having a cursory understanding of threads vs objects vs references. Is there something specific I should be looking for in our heap dump. I took one before restarting the server on Sunday. Currently using YourKit Java Profiler, but can load up pretty much anything with a trial if there is a recommended tool. – Ross H Feb 11 '13 at 14:55
  • Doing a basic "check" of the heap dump, I examined "Biggest objects (dominators)" under yourkit. There is a line that says org.apache.tomcat.util.threads.TaskThread [Stack Local, Thread] "http-bio-8080-exec-2872" and it is by far the largest with a retained size of 602,935,448. Seems pretty big, but as I mentioned, the threads vs memory stuff is beyond me at the moment. – Ross H Feb 11 '13 at 14:56
  • I looked for the first time at the YourKit web page. But it shows the "dominators" report. Did you drill down under the TaskThread to see what inside there was taking up the 600 Mb? The TaskThread class seems to be the one tomcat uses to instantiate a new thread. (I know some guys who are really happy with 'dynaTrace' but its not cheap. I also have heard good things about VMWare's profiler but it may only run on tcServer version of Tomcat or only in VFabric. The expensive version is Hyperic. – Lee Meador Feb 11 '13 at 16:50
  • ->org.apache.tomcat.utils.threads.TaskThread[Stack Local, Thread] "http-bio-8080-exec-2872" -->net.sf.jasperreports.engine.fill.JRVerticalFiller [Stack Local] --->net.sf.jasperreports.engine.JasperPrint ---->java.util.ArrayList ----->java.lang.Object[7634068] Clicking QuickInfo on that last time has the following info Distance to nearest GC root: 4 Shallow size: 30,536,288 Retained objects (includes the object itself): 17,878,889 Retained size (includes shallow size): 602,661,408 I believe JasperReport is an object used to generate PDF reports for our application. – Ross H Feb 11 '13 at 19:54
  • It says there is one ArrayList object (sort of an array of data elements) with 600 Mb of stuff in it. I'm guessing the array length is pretty large since the base object occupies 35 Mb. Jasper Reports is a common report generator. I'm guessing the reports are pretty big sometimes. What it doesn't say is whether this object is in need of GC or not. – Lee Meador Feb 11 '13 at 20:55
  • If you just caught it in the middle of report generation or there is a report cache that keeps them for a while (on purpose), you are gonna need more memory. If not, it might be uncollectable (via GC) because of a bug in the program. Finally, it might should be GCed but isn't for some reason. – Lee Meador Feb 11 '13 at 20:56
  • Does Jasper Reports suggest JVM settings when used inside Tomcat? What happens if you let the JVM pick how to GC? – Lee Meador Feb 11 '13 at 20:58
  • I'll have to look into Jasper Reports and JVM settings. This thread was generating a 1 page pdf with a small picture on it, nothing giant, it should have been done fairly quickly. It was running for more than 15 minutes though. I would blame the lack of available memory/CPU thrashing, but it was the cause of said lack of memory. When we kicked off this new environment, we didn't have any declarations on how/when to GC and it would fill up in a couple of hours and hang. That may have been because of our bad locked object code. I haven't changed it back to use defaults yet. – Ross H Feb 11 '13 at 21:55
  • Sorry for the lack of updates, we've been fighting other java issues in house last few days. We've not had a system tie up with full tenured memory since the last one, so we've not repeated the problem. This is of course good for the end user, bad for us, as we know we haven't 'fixed' the problem and it will probably crop up again around 3AM when I'm on call. – Ross H Feb 19 '13 at 15:00
  • Hi, I`m having same issue on Glassdish with a Java EE web app. Have you resolved this problem? Or have you at least found the root cause or workaround? Thanks. – Oskars Pakers Apr 07 '14 at 12:31

0 Answers0