I have an application which connects to Hazelcast. Lately i found that the requests to hazelcast eventually started becoming unresponsive, hence, i took a thread dump of the Hazelcast process. While analyzing thread dumps from development and production environment i found that the threads waiting for task in the pool are in different states in different environments.
While on production servers, the threads are blocked (337 out of 500). On development environment, no threads are blocked instead (50% as runnable and 50% as waiting out of 60 threads).
Are those blocking threads waiting on synchronized block which is held indefinitely by some threads? Are 500 threads too many (I got a warning by some analyzers)? Is this causing my application to become unresponsive?
What could be a possible cause of this state and how to resolve this?
Thread dumps (Production):
Thread 120713: (state = BLOCKED)
- sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
- java.util.concurrent.ForkJoinPool.awaitWork(java.util.concurrent.ForkJoinPool$WorkQueue, int) @bci=350, line=1824 (Compiled frame)
- java.util.concurrent.ForkJoinPool.runWorker(java.util.concurrent.ForkJoinPool$WorkQueue) @bci=44, line=1693 (Interpreted frame)
- java.util.concurrent.ForkJoinWorkerThread.run() @bci=24, line=157 (Interpreted frame)
Thread 120743: (state = BLOCKED)
- sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
- java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=175 (Compiled frame)
- java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() @bci=42, line=2039 (Compiled frame)
- java.util.concurrent.LinkedBlockingQueue.take() @bci=29, line=442 (Compiled frame)
- java.util.concurrent.ThreadPoolExecutor.getTask() @bci=149, line=1074 (Compiled frame)
Thread 120743: (state = BLOCKED)
- sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
- java.util.concurrent.locks.LockSupport.park() @bci=5, line=304 (Compiled frame)
- com.hazelcast.internal.util.concurrent.MPSCQueue.takeAll() @bci=83, line=231 (Compiled frame)
- com.hazelcast.internal.util.concurrent.MPSCQueue.take() @bci=12, line=153 (Compiled frame)
- com.hazelcast.client.spi.impl.ClientResponseHandlerSupplier$ResponseThread.doRun() @bci=17, line=164 (Compiled
Thread 128753: (state = BLOCKED)
- sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
- java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long) @bci=20, line=215 (Compiled frame)
- java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(long) @bci=78, line=2078 (Compiled frame)
- java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take() @bci=124, line=1093 (Compiled frame)
- java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take() @bci=1, line=809 (Compiled frame)
Thread dumps from development env:
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000006c1a1bc38> (a java.util.concurrent.SynchronousQueue$TransferStack)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)