0

My application is under heavy load and I am getting below logs for

sudo -u tomcat jstack <java_process_id>

The below thread is consuming the messages from Kafka, and it got stuck. Since this thread is in WAITING state, no more kafka messages are being consumed.

"StreamThread-3" #91 daemon prio=5 os_prio=0 tid=0x00007f9b5c606000 nid=0x1e4d waiting on condition [0x00007f9b506c5000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x000000073aad9718> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353)
    at ch.qos.logback.core.AsyncAppenderBase.put(AsyncAppenderBase.java:160)
    at ch.qos.logback.core.AsyncAppenderBase.append(AsyncAppenderBase.java:148)
    at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
    at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
    at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
    at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
    at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
    at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:383)
    at ch.qos.logback.classic.Logger.error(Logger.java:538)
    at com.abc.system.solr.repo.AbstractSolrRepository.doSave(AbstractSolrRepository.java:316)
    at com.abc.system.solr.repo.AbstractSolrRepository.save(AbstractSolrRepository.java:295)

I also found this post WAITING at sun.misc.Unsafe.park(Native Method) but it didn't help me in my case.

What else I could investigate to get more details in such case?

  • In this stacktrace you can see that an error was captured in the method `AbstractSolrRepository.doSave()`, from where an attempt was made to log it. The logger passed the error to the appender which tried to add the error to its blocking queue. The thread then tries to acquire an internal lock from the queue, which it hasn't succeed to achieve as the thread dump was taken, most likely because the queue was full. Maybe you had an occurrence of cascading failures or an event which generated a burst of logs? – Alexandre Dupriez Mar 18 '18 at 22:32
  • Thanks Alex, I think you’re right there were lot of logger.error messages printed in logs. And I would like to understand this more deeply. Also what should I do for this instance(what if I don’t want to kill the instance and redeploy? I mean can I empty that queue or something externally?). – Nirav Modi Mar 19 '18 at 01:11
  • 1
    Well I'd probably try to address the error happening in `AbstractSolrRepository`? It seems this class is from your codebase? – Alexandre Dupriez Mar 19 '18 at 22:21
  • @AlexandreDupriez the thread is not trying to acquire the lock, but within `Condition.await`, so it’s definitely a full queue as the only possible call in `put` is `notFull.await()`… – Holger Mar 20 '18 at 13:50
  • @NiravModi when you say, you have a “lot of logger.error messages printed in logs”, it implies that you are repeatedly producing the error again, so emptying the queue once wouldn’t help, as it would fill again. Normally, it shouldn’t be a problem when the thread is in a waiting state, as that implies that there now are more resources for the log handler thread(s) to process the queued messages, so the initiating thread will eventually proceed. But the primary problem is the error that is happening at a high rate. – Holger Mar 20 '18 at 13:56
  • @Holger Yes - sorry for the confusion, the thread is indeed waiting for a signal. – Alexandre Dupriez Mar 20 '18 at 14:11
  • Thanks @Holger @AlexandreDupriez! – Nirav Modi Mar 21 '18 at 17:21

1 Answers1

0

I also ran into same problem. But, luckily I got my issue resolved by playing around with the size of pool and number of producer and consumer.

Try to check if there is any way to configure following.

  1. Size of your thread pool
  2. Number of consumers/producers (if we can configure in kafka)

Make sure that thread pool should have enough threads to serve consumer and producer both.

vsk.rahul
  • 433
  • 1
  • 6
  • 11