22

A production environment became very slow recently. The cpu of the process took 200%. It kept working however. After I restarted the service it functioned normal again. I have several symptoms : The Par survivor space heap was empty for a long time and garbage collection took about 20% of the cpu time.

JVM options:

X:+CMSParallelRemarkEnabled, -XX:+HeapDumpOnOutOfMemoryError, -XX:+UseConcMarkSweepGC, -                XX:+UseParNewGC, -XX:HeapDumpPath=heapdump.hprof, -XX:MaxNewSize=700m, -XX:MaxPermSize=786m, -XX:NewSize=700m, -XX:ParallelGCThreads=8, -XX:SurvivorRatio=25, -Xms2048m, -Xmx2048m

     Arch   amd64
     Dispatcher Apache Tomcat
     Dispatcher Version 7.0.27
     Framework  java
     Heap initial (MB)  2048.0
     Heap max (MB)  2022.125
     Java version   1.6.0_35
    Log path    /opt/newrelic/logs/newrelic_agent.log
    OS  Linux
    Processors  8
    System Memory   8177.964, 8178.0

More info in the attached pic When the problem occurred on the non-heap the used code cache and used cms perm gen dropped to half.

I took the info from the newrelic.enter image description here

The question is why does the server start to work so slow.

Sometimes the server stops completely, but we found that there is a problem with PDFBox, when uploading some pdf and contains some fonts it crashes the JVM.

More info: I observed that every day the Old gen is filling up. Now I restart the server daily. After restart it's all nice and dandy but the old gen is filling up till next day and the server slows down till needs a restart.

Bogdan
  • 412
  • 1
  • 3
  • 14
  • 1
    So, what's the question? – Frank Pavageau Oct 17 '13 at 21:08
  • If your perm space is exhausted and your new space is almost empty, wouldn't it make sense to decrease the size of the new space and allocate more space to the perm gen? If you do this and wind up with the same problem, you may have a memory leak. – TMN Oct 18 '13 at 14:42
  • The perm contains the compiled classes and some other stuff. The problem that I see is that the Old Gen is building up. I have to restart the server every day to return to normal. When the perm gen graphic(and the rest) went down there is the restart of the tomcat. – Bogdan Oct 23 '13 at 14:51

1 Answers1

27

By Default CMS starts to collect concurrently if OldGen is 70%. If it can't free memory below this boundary, it will run permanently concurrent which will slow down operation significantly. If OldSpace is getting near full OldGen usage, it will panic and fall back to stop-the-world GC pause which can be very long (like 20 seconds). You probably need more headroom in OldGen (ensure your app does not leak memory ofc !). Additionally you can lower the threshold to start a concurrent collection (default 70%) using

-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50

this will trigger concurrent collection starting with 50% occupancy and increase chance CMS finishes GC in time. This will only help in case your allocation rate is too high, from your charts it looks like not-enough-headrooom/memleak + too high XX:CMSInitiatingOccupancyFraction. Give at least 500MB to 1 GB more OldGen space

R.Moeller
  • 3,436
  • 1
  • 17
  • 12