1

We are facing a strange issue in our Grails application. The total memory consumption shoots up in a very short period. It runs well for a long time, but at certain point, the memory consumption just rises until there is no more memory to consume(Xmx). It may rise from 4 GB consumption to 20 GB within 5 minutes. Once the complete memory has been consumed, tomcat becomes unresponsive. It does not heel itself even if it is left alone. At some point, I would expect an OutOfMemory exception. but that never happens even if we leave the application untouched. I can see that the garbage collector keeps on running(we are using New relic), but still the consumed memory does not go down.

When we noticed that garbage collection is happening in one big blow, we changed garbage collector to G1GC from ConcurrentMarkSweep. But has not helped as well. For analysing the non-responsive jvm, we took a thread dump and found that there were no Deadlocks and also a lot of our threads were in a “BLOCKED” state:

Thread 6632: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=186 (Compiled frame)
 - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() @bci=42, line=2043 (Compiled frame)
 - java.util.concurrent.LinkedBlockingQueue.take() @bci=29, line=442 (Compiled frame)
 - org.apache.tomcat.util.threads.TaskQueue.take() @bci=36, line=104 (Interpreted frame)
 - org.apache.tomcat.util.threads.TaskQueue.take() @bci=1, line=32 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor.getTask() @bci=156, line=1068 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=26, line=1130 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=724 (Interpreted frame)

Only a few of these threads were in a “IN_NATIVE” state.

We even took a memory dump using jmap, The top consuming classes look like this: Visual VM heap analysis

We are using Grails 2.3.8 with MongoDB Plugin 3.0.1 for GORM. We are using Memcached for session sharing on tomcat7 and Redis for caching(spring-cache as well as native redis).

Our server details are:

Server version: Apache Tomcat/7.0.35
Server built:   May 24 2013 09:52:20
Server number:  7.0.35.0
OS Name:        Linux
OS Version:     3.8.0-19-generic
Architecture:   amd64
JVM Version:    1.7.0_25-b15
JVM Vendor:     Oracle Corporation

We are really out of ideas on how to fix this issue. Looking for help/pointers in resolving this issue.

Himanshu
  • 196
  • 1
  • 5
  • You need to make sure your application has enough memory in the first place. I answered a similar question here http://stackoverflow.com/questions/24169976/understanding-groovy-grails-classloader-leak/24172698#24172698 However, that doesn't rule out a memory leak in your application. Having you attached a profiler (I recommend Yourkit or JProfiler) – Graeme Rocher Jun 26 '14 at 08:28
  • Hi Graeme, we have given 20GB of memory, which should be more than enough. -Djava.awt.headless=true -Xmx20480m -XX:MaxPermSize=1024m -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -Duser.timezone=IST -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/mnt/tomcat-logs/garbage.log Also the memory spike does not follow any patterns(if there is leak in certain portions). I am trying to attach a profiler to see if we can find anything. Anything specific that I should be looking for while using JProfiler/Yourkit. – Himanshu Jun 26 '14 at 09:11
  • You need to identify the objects that are consuming the memory. If a profiler isn't practical adding logging to your application and increase the log level so that when the spike happens you can diagnose the logs to find out what your application is doing at the time of the spike. – Graeme Rocher Jun 26 '14 at 10:44
  • Did you try to find out which urls were called before tomcat froze?. You can activate the localhost_access_log in tomcat and try to reproduce those calls... – Gil Jun 26 '14 at 11:17
  • Now that I think more, I remember we used to have a problem like that. The problem was that the json was not parsed correctly, it was cached but only part of it. So when we tried to parse it our memory went crazy. We had to deactivate the cache in some cases... let me know if this could be the case and I can write more.. – Gil Jun 26 '14 at 11:23
  • gil, can you please elaborate more on "json was not parsed correctly" – Himanshu Jun 26 '14 at 11:29

0 Answers0