We have Java application using Spring 3.2.5 and running on Tomcat 7.
For some unknown reason for last couple of days it started after period of correct behavior to use 100% of single core (This is machine with 2 CPU 8 cores each), and than for a couple of moments its overall CPU usage is high but spread across Cores.
We started to investigate it from GC point - no luck there, trying much more/less memory, different GC type and configuration does not help.
When it is using 100% of single core we managed to dumt threads with jstack, there were a lot of such threads:
Thread 11260: (state = BLOCKED)
- sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int) @bci=0 (Compiled frame; information may be imprecise)
Thread 11375: (state = BLOCKED)
- sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int) @bci=0 (Compiled frame; information may be imprecise)
- sun.nio.ch.EPollArrayWrapper.poll(long) @bci=18, line=269 (Compiled frame)
Thread 11421: (state = BLOCKED)
- com.mysql.jdbc.Field.getStringFromBytes(int, int) @bci=144, line=719 (Compiled frame)
We started to investigate network traffic - but this seams not to be a case.
What else we could investigate? What can lead to such strange usage of CPU - while it is consuming all of single core, application is not responding.
There is not thread in RUNNING state, but there are many of:
Thread 12947: (state = IN_NATIVE)
- java.net.PlainSocketImpl.$$YJP$$socketAccept(java.net.SocketImpl) @bci=0 (Interpreted frame)
- java.net.PlainSocketImpl.socketAccept(java.net.SocketImpl) @bci=8 (Interpreted frame)
- java.net.AbstractPlainSocketImpl.accept(java.net.SocketImpl) @bci=13, line=398 (Interpreted frame)
- java.net.ServerSocket.implAccept(java.net.Socket) @bci=111, line=530 (Interpreted frame)
- java.net.ServerSocket.accept() @bci=119, line=498 (Interpreted frame)
- org.apache.catalina.core.StandardServer.await() @bci=269, line=453 (Interpreted frame)
- org.apache.catalina.startup.Catalina.await() @bci=10, line=777 (Interpreted frame)
- org.apache.catalina.startup.Catalina.start() @bci=272, line=723 (Interpreted frame)
- sun.reflect.NativeMethodAccessorImpl.invoke0(java.lang.reflect.Method, java.lang.Object, java.lang.Object[]) @bci=0 (Interpreted frame)
- sun.reflect.NativeMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=118, line=57 (Interpreted frame)
- sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=12, line=43 (Interpreted frame)
- java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) @bci=63, line=606 (Interpreted frame)
- org.apache.catalina.startup.Bootstrap.start() @bci=43, line=321 (Interpreted frame)
- org.apache.catalina.startup.Bootstrap.main(java.lang.String[]) @bci=187, line=455 (Interpreted frame)
$$YJP$$ is from YK profiler.
On other machine, having same issues I found 345 of 497 Threads again IN_NATIVE:
Thread 4979: (state = IN_NATIVE)
- sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int) @bci=0 (Compiled frame; information may be imprecise)
- sun.nio.ch.EPollArrayWrapper.poll(long) @bci=18, line=269 (Compiled frame)
- sun.nio.ch.EPollSelectorImpl.doSelect(long) @bci=28, line=79 (Compiled frame)
- sun.nio.ch.SelectorImpl.lockAndDoSelect(long) @bci=37, line=87 (Compiled frame)
- sun.nio.ch.SelectorImpl.select(long) @bci=30, line=98 (Compiled frame)
- org.jboss.netty.channel.socket.nio.SelectorUtil.select(java.nio.channels.Selector) @bci=4, line=68 (Compiled frame)
- org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(java.nio.channels.Selector) @bci=1, line=415 (Compiled frame)
- org.jboss.netty.channel.socket.nio.AbstractNioSelector.run() @bci=56, line=212 (Compiled frame)
- org.jboss.netty.channel.socket.nio.AbstractNioWorker.run() @bci=1, line=89 (Interpreted frame)
- org.jboss.netty.channel.socket.nio.NioWorker.run() @bci=1, line=178 (Interpreted frame)
- org.jboss.netty.util.ThreadRenamingRunnable.run() @bci=55, line=108 (Interpreted frame)
- org.jboss.netty.util.internal.DeadLockProofWorker$1.run() @bci=14, line=42 (Interpreted frame)
- java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1145 (Interpreted frame)
- java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 (Interpreted frame)
- java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
I found as well one:
Thread 13591: (state = IN_JAVA)
- org.hibernate.engine.Cascade.cascadeAssociation(java.lang.Object, org.hibernate.type.Type, org.hibernate.engine.CascadeStyle, java.lang.Object, boolean) @bci=16, line=239 (Compiled frame; information may be imprecise)
- org.hibernate.engine.Cascade.cascadeProperty(java.lang.Object, org.hibernate.type.Type, org.hibernate.engine.CascadeStyle, java.lang.Object, boolean) @bci=42, line=193 (Compiled frame)
- org.hibernate.engine.Cascade.cascade(org.hibernate.persister.entity.EntityPersister, java.lang.Object, java.lang.Object) @bci=224, line=154 (Compiled frame)
- org.hibernate.event.def.AbstractFlushingEventListener.cascadeOnFlush(org.hibernate.event.EventSource, org.hibernate.persister.entity.EntityPersister, java.lang.Object, java.lang.Object) @bci=60, line=154 (Compiled frame)
- org.hibernate.event.def.AbstractFlushingEventListener.prepareEntityFlushes(org.hibernate.event.EventSource) @bci=106, line=145 (Compiled frame)
- org.hibernate.event.def.AbstractFlushingEventListener.flushEverythingToExecutions(org.hibernate.event.FlushEvent) @bci=79, line=88 (Compiled frame)
- org.hibernate.event.def.DefaultAutoFlushEventListener.onAutoFlush(org.hibernate.event.AutoFlushEvent) @bci=31, line=58 (Compiled frame)
- org.hibernate.impl.SessionImpl.autoFlushIfRequired(java.util.Set) @bci=83, line=997 (Compiled frame)
- org.hibernate.impl.SessionImpl.list(java.lang.String, org.hibernate.engine.QueryParameters) @bci=55, line=1149 (Compiled frame)
- org.hibernate.impl.QueryImpl.list() @bci=33, line=102 (Compiled frame)
- org.hibernate.impl.AbstractQueryImpl.uniqueResult() @bci=7, line=835 (Compiled frame)
We might be missing some resources, but this happens in both - high and low server load, and in different times.