1

I tried every thing to find the cause but not sure whats happing, tried to jProfile, but its not getting any results as cpu is already at 100%.

I am running a Spring application (WAR file) on amazon linux on Tomcat8. no database operations but yes it access a rest api for different operations running in separate environment. There are 3 servers have same issue running behind a load balancer in a beanstalk. with average of 518K requests per hour and average latency of 72.2 milliseconds

I feel like there is not much issue with the code itself

I just tried a command kill -3 pid (java process), and got some results in catalina.out file. I can see following error again and again

"http-nio-8080-exec-13" #42 daemon prio=5 os_prio=0 tid=0x00007f0898005800 nid=0xfb2 waiting on condition [0x00007f0882dec000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000f1694f58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
    at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
    at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
    at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:745)

"http-nio-8080-exec-18" #41 daemon prio=5 os_prio=0 tid=0x00007f088c024000 nid=0xfb1 waiting on condition [0x00007f0882eed000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000f1694f58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
    at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
    at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
    at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:745)

and at the end is following information

"VM Thread" os_prio=0 tid=0x00007f08d8081000 nid=0xf8b runnable 

"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007f08d801e800 nid=0xf89 runnable 

"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007f08d8020800 nid=0xf8a runnable 

"VM Periodic Task Thread" os_prio=0 tid=0x00007f08d80f0000 nid=0xf92 waiting on condition 

JNI global references: 478

Heap
 PSYoungGen      total 85504K, used 28368K [0x00000000fab00000, 0x0000000100000000, 0x0000000100000000)
  eden space 83968K, 33% used [0x00000000fab00000,0x00000000fc61cd08,0x00000000ffd00000)
  from space 1536K, 39% used [0x00000000ffd00000,0x00000000ffd97340,0x00000000ffe80000)
  to   space 1536K, 0% used [0x00000000ffe80000,0x00000000ffe80000,0x0000000100000000)
 ParOldGen       total 175104K, used 107142K [0x00000000f0000000, 0x00000000fab00000, 0x00000000fab00000)
  object space 175104K, 61% used [0x00000000f0000000,0x00000000f68a1a48,0x00000000fab00000)
 Metaspace       used 56563K, capacity 60632K, committed 60800K, reserved 1103872K
  class space    used 5218K, capacity 5736K, committed 5760K, reserved 1048576K

Could someone please explain what is happening here?

Ok here are some of the logs I pulled from the log which are in running state

(couple of times in log)

"ajp-nio-8009-Acceptor-0" #23 daemon prio=5 os_prio=0 tid=0x00007f08d8535800 nid=0xfa0 runnable [0x00007f0883ffe000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
    - locked <0x00000000f0881618> (a java.lang.Object)
    at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:682)
    at java.lang.Thread.run(Thread.java:745)

"ajp-nio-8009-ClientPoller-1" #22 daemon prio=5 os_prio=0 tid=0x00007f08d8534000 nid=0xf9f runnable [0x00007f08a83b2000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x00000000f16a8100> (a sun.nio.ch.Util$2)
    - locked <0x00000000f16a80f0> (a java.util.Collections$UnmodifiableSet)
    - locked <0x00000000f16a7fc8> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1034)
    at java.lang.Thread.run(Thread.java:745)

"ajp-nio-8009-ClientPoller-0" #21 daemon prio=5 os_prio=0 tid=0x00007f08d82f8800 nid=0xf9e runnable [0x00007f08a84b3000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x00000000f16a8900> (a sun.nio.ch.Util$2)
    - locked <0x00000000f16a88f0> (a java.util.Collections$UnmodifiableSet)
    - locked <0x00000000f16a87c8> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1034)
    at java.lang.Thread.run(Thread.java:745)

"http-nio-8080-Acceptor-0" #20 daemon prio=5 os_prio=0 tid=0x00007f08d82f7000 nid=0xf9d runnable [0x00007f08a85b4000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
    - locked <0x00000000f0882f68> (a java.lang.Object)
    at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:682)
    at java.lang.Thread.run(Thread.java:745)

"http-nio-8080-ClientPoller-1" #19 daemon prio=5 os_prio=0 tid=0x00007f08d82f5800 nid=0xf9c runnable [0x00007f08a86b5000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x00000000f1677900> (a sun.nio.ch.Util$2)
    - locked <0x00000000f16778f0> (a java.util.Collections$UnmodifiableSet)
    - locked <0x00000000f16777a8> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1034)
    at java.lang.Thread.run(Thread.java:745)

"http-nio-8080-ClientPoller-0" #18 daemon prio=5 os_prio=0 tid=0x00007f08d82f4000 nid=0xf9b runnable [0x00007f08a87b6000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x00000000f1666290> (a sun.nio.ch.Util$2)
    - locked <0x00000000f1666280> (a java.util.Collections$UnmodifiableSet)
    - locked <0x00000000f1666138> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1034)
    at java.lang.Thread.run(Thread.java:745)

There is another one Not sure if its causing a problem (but only once in log)

"http-nio-8080-exec-131" #160 daemon prio=5 os_prio=0 tid=0x00007f088c100800 nid=0x153e runnable [0x00007f08727e5000]
   java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:170)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
    - locked <0x00000000f688f4d0> (a java.io.BufferedInputStream)
    at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
    at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536)
    - locked <0x00000000f6892258> (a sun.net.www.protocol.http.HttpURLConnection)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
    - locked <0x00000000f6892258> (a sun.net.www.protocol.http.HttpURLConnection)
    at sun.net.www.protocol.http.HttpURLConnection.getHeaderFields(HttpURLConnection.java:2966)
    at com.code.http.WebUtility.getUrlContents(WebUtility.java:163)

and the code on this lines is con.getHeaderFields(), where as con is HttpURLConnection

try {
    con = (HttpURLConnection) url.opencon();
    con.setDoInput(true);
    con.setDoOutput(true);
    con.connect();
    writeJson (con);

    this.header = con.getHeaderFields();//this is the line in running state
    this.code = con.getResponseCode();
    return readSuccessStream(con);
} catch (IOException e) {
    if (con != null) {
        return readFailureStream(con);
    }
    return "a nasty error occured";
} finally {
    if (con != null) {
        con.disconnect();
    }
}

and few other treads in running state

"Service Thread" #7 daemon prio=9 os_prio=0 tid=0x00007f08d80db000 nid=0xf91 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007f08d80c8800 nid=0xf90 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007f08d80bb000 nid=0xf8f waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007f08d80b9000 nid=0xf8e waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

and the last one

"main" #1 prio=5 os_prio=0 tid=0x00007f08d8009800 nid=0xf88 runnable [0x00007f08de871000]
   java.lang.Thread.State: RUNNABLE
    at java.net.PlainSocketImpl.socketAccept(Native Method)
    at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
    at java.net.ServerSocket.implAccept(ServerSocket.java:545)
    at java.net.ServerSocket.accept(ServerSocket.java:513)
    at org.apache.catalina.core.StandardServer.await(StandardServer.java:446)
    at org.apache.catalina.startup.Catalina.await(Catalina.java:717)
    at org.apache.catalina.startup.Catalina.start(Catalina.java:663)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:351)
    at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:485)
PHP Avenger
  • 1,744
  • 6
  • 37
  • 66
  • To be clear, those are not error messages. Those are stack traces. `kill -3` instructs the JVM to write a stack trace for every thread. It's the same format as a stack trace from an exception, but it does *not* indicate an exception is being thrown. It just indicates what the thread is currently doing. – Brandon Nov 06 '16 at 16:56
  • Indeed that is just a way to get the dump, read some where, and I can see `java.lang.Thread.State: TIMED_WAITING (parking)` 113 times, could this be a reason???? – PHP Avenger Nov 06 '16 at 17:00
  • 1
    @PHPAvenger No, those threads are just blocking on a `LinkedBlockingQueue.poll()` call (the one with a timeout). That's not CPU intensive. – Kayaman Nov 06 '16 at 17:10
  • Right, those are normal. That is how Tomcat delegates HTTP request handling. One main acceptor thread accepts connections from the TCP socket, and uses a blocking queue to delegate work to a pool of threads. The threads you showed in your question are simply waiting for connections to be delegated to them. They are idle and as @Kayaman pointed out, they are not consuming CPU time. – Brandon Nov 06 '16 at 19:27
  • 2
    Look for threads which are in the `RUNNABLE` state. Runnable threads are actually putting load on the CPU. Also consider that the garbage collector thread (which may not be in the list since it's internal to the JVM) may be hogging all the cycles. – Brandon Nov 06 '16 at 19:29
  • @Brandon I have updated my question with more logging details, I can see following logs couple of times `sun.nio.ch.EPollArrayWrapper.epollWait`, please have look on question for more detail. Thanks – PHP Avenger Nov 06 '16 at 20:10
  • So it's hard for me to say, really. The thread patterns there, when I have seen the same pattern, is usually indicative of memory pressure as the JVM is thrashing to garbage collect enough memory to keep up with new allocation demands (which is what those threads are doing). However, I'm not confident in this scenario because you included a dump of GC status which does not show anywhere close to 100% usage. – Brandon Nov 06 '16 at 21:34
  • @Brandon If you could please summarize the states in `simple plain English please` and the point you said about GC (In short if you could please please please rewrite all your comments in the form of single answer in simple English for dummies). I will accept that answer. – PHP Avenger Nov 12 '16 at 17:24
  • @PHPAvenger Done as requested. I tried to make it as plain English as possible. Multi-threading is a complex topic. – Brandon Nov 13 '16 at 20:59
  • A java profiler is what you need. JVisualVM or YourKit. – bmargulies Nov 13 '16 at 21:13

1 Answers1

8

Thread states

The states of a thread in a Java virtual machine are:

NEW
A thread that has not yet started is in this state.

RUNNABLE
A thread executing in the Java virtual machine is in this state.

BLOCKED
A thread that is blocked waiting for a monitor lock is in this state.

WAITING
A thread that is waiting indefinitely for another thread to perform a particular action is in this state.

TIMED_WAITING
A thread that is waiting for another thread to perform an action for up to a specified waiting time is in this state.

TERMINATED
A thread that has exited is in this state.

In Java, threads are represented as Thread objects (java.lang.Thread). When you create a new thread, literally with new Thread(), you create a thread, but it isn't doing anything yet. Once you call Thread.start(), it moves from the NEW state into the RUNNABLE state. Runnable means it has pending work for the CPU to perform. While running, a thread can move between RUNNABLE, BLOCKED, WAITING, and TIMED_WAITING. When a thread exits, it is left in the TERMINATED state.

When trying to diagnose high CPU usage by a Java process, we can rule out threads which are NEW or TERMINATED since they are dead. A thread which is BLOCKED, WAITING, or TIMED_WAITING are not doing anything. They are, literally, waiting for some event in the future.

What does that mean? Only threads in the RUNNABLE state can be putting load on the CPU.

What are all those threads?

Tomcat uses a pool of threads to be able to handle multiple requests at the same time.

This first thread is the acceptor thread. Its purpose is to listen on the network socket for incoming requests. But in order to handle multiple requests simultaneously, it needs to delegate the work. Otherwise, each request would lock up the server until it finishes, meaning only one user can connect to the web server at a time. So this thread simple accepts the connections and then hands them off to another thread so it can accept more connections.

"http-nio-8080-Acceptor-0" #20 daemon prio=5 os_prio=0 tid=0x00007f08d82f7000 nid=0xf9d runnable [0x00007f08a85b4000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
    - locked <0x00000000f0882f68> (a java.lang.Object)
    at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:682)
    at java.lang.Thread.run(Thread.java:745)

These threads are the worker threads to which Tomcat assigns the requests:

"http-nio-8080-exec-13" #42 daemon prio=5 os_prio=0 tid=0x00007f0898005800 nid=0xfb2 waiting on condition [0x00007f0882dec000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000f1694f58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
    at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
    at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
    at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:745)

That means the thread is sleeping. It is waiting for the acceptor thread to give it work. It is contributing nothing to your 100% CPU usage.

Conclusion

The runnable threads I saw in your thread dumps show that reading from a network socket is where the CPU load is spent. I cannot say definitively why. There are two possible reasons: a problem with the network or remote system being extremely latent or reading is a problem due to memory usage.

Most of the time when I have seen this, the CPU work was coming from the garbage collector, which runs in another thread. When there is pressure to allocate memory and little is available, the garbage collector has to do a lot of expensive work to find available memory while the other threads wait. This thread might not be visible in the thread dumps you showed since it's not a thread created in Java code, but is an internal part of the Java virtual machine. I can't say for sure if that is the issue in your case. It could be a network problem or problem with the remote system you are reading from. I suggest looking carefully at every thread to try to find a pattern.

Brandon
  • 9,822
  • 3
  • 27
  • 37