1

We have Java application using Spring 3.2.5 and running on Tomcat 7.

For some unknown reason for last couple of days it started after period of correct behavior to use 100% of single core (This is machine with 2 CPU 8 cores each), and than for a couple of moments its overall CPU usage is high but spread across Cores.

We started to investigate it from GC point - no luck there, trying much more/less memory, different GC type and configuration does not help.

When it is using 100% of single core we managed to dumt threads with jstack, there were a lot of such threads:

Thread 11260: (state = BLOCKED)
 - sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int) @bci=0 (Compiled frame; information may be imprecise)

Thread 11375: (state = BLOCKED)
 - sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int) @bci=0 (Compiled frame; information may be imprecise)
 - sun.nio.ch.EPollArrayWrapper.poll(long) @bci=18, line=269 (Compiled frame)

Thread 11421: (state = BLOCKED)
 - com.mysql.jdbc.Field.getStringFromBytes(int, int) @bci=144, line=719 (Compiled frame)

We started to investigate network traffic - but this seams not to be a case.

What else we could investigate? What can lead to such strange usage of CPU - while it is consuming all of single core, application is not responding.

There is not thread in RUNNING state, but there are many of:

Thread 12947: (state = IN_NATIVE)
 - java.net.PlainSocketImpl.$$YJP$$socketAccept(java.net.SocketImpl) @bci=0 (Interpreted frame)
 - java.net.PlainSocketImpl.socketAccept(java.net.SocketImpl) @bci=8 (Interpreted frame)
 - java.net.AbstractPlainSocketImpl.accept(java.net.SocketImpl) @bci=13, line=398 (Interpreted frame)
 - java.net.ServerSocket.implAccept(java.net.Socket) @bci=111, line=530 (Interpreted frame)
 - java.net.ServerSocket.accept() @bci=119, line=498 (Interpreted frame)
 - org.apache.catalina.core.StandardServer.await() @bci=269, line=453 (Interpreted frame)
 - org.apache.catalina.startup.Catalina.await() @bci=10, line=777 (Interpreted frame)
 - org.apache.catalina.startup.Catalina.start() @bci=272, line=723 (Interpreted frame)
 - sun.reflect.NativeMethodAccessorImpl.invoke0(java.lang.reflect.Method, java.lang.Object, java.lang.Object[]) @bci=0 (Interpreted frame)
 - sun.reflect.NativeMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=118, line=57 (Interpreted frame)
 - sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=12, line=43 (Interpreted frame)
 - java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) @bci=63, line=606 (Interpreted frame)
 - org.apache.catalina.startup.Bootstrap.start() @bci=43, line=321 (Interpreted frame)
 - org.apache.catalina.startup.Bootstrap.main(java.lang.String[]) @bci=187, line=455 (Interpreted frame)

$$YJP$$ is from YK profiler.

On other machine, having same issues I found 345 of 497 Threads again IN_NATIVE:

Thread 4979: (state = IN_NATIVE)
 - sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int) @bci=0 (Compiled frame; information may be imprecise)
 - sun.nio.ch.EPollArrayWrapper.poll(long) @bci=18, line=269 (Compiled frame)
 - sun.nio.ch.EPollSelectorImpl.doSelect(long) @bci=28, line=79 (Compiled frame)
 - sun.nio.ch.SelectorImpl.lockAndDoSelect(long) @bci=37, line=87 (Compiled frame)
 - sun.nio.ch.SelectorImpl.select(long) @bci=30, line=98 (Compiled frame)
 - org.jboss.netty.channel.socket.nio.SelectorUtil.select(java.nio.channels.Selector) @bci=4, line=68 (Compiled frame)
 - org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(java.nio.channels.Selector) @bci=1, line=415 (Compiled frame)
 - org.jboss.netty.channel.socket.nio.AbstractNioSelector.run() @bci=56, line=212 (Compiled frame)
 - org.jboss.netty.channel.socket.nio.AbstractNioWorker.run() @bci=1, line=89 (Interpreted frame)
 - org.jboss.netty.channel.socket.nio.NioWorker.run() @bci=1, line=178 (Interpreted frame)
 - org.jboss.netty.util.ThreadRenamingRunnable.run() @bci=55, line=108 (Interpreted frame)
 - org.jboss.netty.util.internal.DeadLockProofWorker$1.run() @bci=14, line=42 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1145 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)

I found as well one:

Thread 13591: (state = IN_JAVA)
 - org.hibernate.engine.Cascade.cascadeAssociation(java.lang.Object, org.hibernate.type.Type, org.hibernate.engine.CascadeStyle, java.lang.Object, boolean) @bci=16, line=239 (Compiled frame; information may be imprecise)
 - org.hibernate.engine.Cascade.cascadeProperty(java.lang.Object, org.hibernate.type.Type, org.hibernate.engine.CascadeStyle, java.lang.Object, boolean) @bci=42, line=193 (Compiled frame)
 - org.hibernate.engine.Cascade.cascade(org.hibernate.persister.entity.EntityPersister, java.lang.Object, java.lang.Object) @bci=224, line=154 (Compiled frame)
 - org.hibernate.event.def.AbstractFlushingEventListener.cascadeOnFlush(org.hibernate.event.EventSource, org.hibernate.persister.entity.EntityPersister, java.lang.Object, java.lang.Object) @bci=60, line=154 (Compiled frame)
 - org.hibernate.event.def.AbstractFlushingEventListener.prepareEntityFlushes(org.hibernate.event.EventSource) @bci=106, line=145 (Compiled frame)
 - org.hibernate.event.def.AbstractFlushingEventListener.flushEverythingToExecutions(org.hibernate.event.FlushEvent) @bci=79, line=88 (Compiled frame)
 - org.hibernate.event.def.DefaultAutoFlushEventListener.onAutoFlush(org.hibernate.event.AutoFlushEvent) @bci=31, line=58 (Compiled frame)
 - org.hibernate.impl.SessionImpl.autoFlushIfRequired(java.util.Set) @bci=83, line=997 (Compiled frame)
 - org.hibernate.impl.SessionImpl.list(java.lang.String, org.hibernate.engine.QueryParameters) @bci=55, line=1149 (Compiled frame)
 - org.hibernate.impl.QueryImpl.list() @bci=33, line=102 (Compiled frame)
 - org.hibernate.impl.AbstractQueryImpl.uniqueResult() @bci=7, line=835 (Compiled frame)

We might be missing some resources, but this happens in both - high and low server load, and in different times.

Konrad Pawlus
  • 1,641
  • 3
  • 14
  • 20
  • Try looking at threads that aren't blocked. – David Schwartz Mar 19 '15 at 16:37
  • David, this is even more strange, there are no threads in RUNNING state, but there are many in state IN_NATIVE and two or three in IN_JAVA. I Edited my question with this detail. – Konrad Pawlus Mar 19 '15 at 17:39
  • Just to be sure, did you look at GC log (enabled with `-XX:+PrintGC` or `-XX:+PrintGCDetails`) to see whether the GC is running at these situations? (Though it seems unlikely if you have tried different GCs, but the log is usually the first place to look at for investigating such things). – Philipp Wendler Mar 19 '15 at 17:52
  • You are using a recent JRE? I'm asking because there was an [epollWait](https://github.com/netty/netty/issues/327#issuecomment-12420871) bug which had similar symptoms. – vanOekel Mar 19 '15 at 18:48
  • I do not know if this is your case , but it might be helpful to you ["What could cause global Tomcat/JVM slowdown?"](http://stackoverflow.com/questions/14876447/what-could-cause-global-tomcat-jvm-slowdown) – Tobías Mar 19 '15 at 19:53
  • @PhilippWendler - yes, both, and we tried YourJit as well, no help there, – Konrad Pawlus Mar 19 '15 at 23:23
  • @vanOekel yes, 1.7-75 – Konrad Pawlus Mar 19 '15 at 23:23
  • @Tobías I will give it a try, will let you know if this helped, – Konrad Pawlus Mar 19 '15 at 23:23

1 Answers1

1

I hope this might have been solved, if not I would like to suggest following steps to find cause of high CPU Utilization

  1. Using OS level command find Which thread is taking high CPU, e.g. TOP with the H key press or -H command line option (top -H -p ), For AIX htop with F5, for windows use perfmon or use Process explorer (https://technet.microsoft.com/en-us/sysinternals/bb896653.aspx)

  2. Identify thread that is causing high CPU and note thread ID (This may be in hex)

  3. Take thread dump and find the thread stack for noted thread

  4. Try to understand which all methods called and which one is CPU intensive method.

This can be monitored in JVisual VM Sampler or JRofiler very easily.

Shambhu
  • 37
  • 1
  • 8