1

We are seeing the frequent JVM hung in Apache camel based application whenever there is a network/connection issue to the IBM MQ.

The logger clearly says the connection problem happened and Spring CachingConnectionFactory is trying to reset the underlying MQ connection. Looks like there is an hand-off issue between Spring and IBM MQ Lib when resetting the connection.

Jul 13, 2018 8:51:48 PM org.springframework.jms.connection.CachingConnectionFactory onException
WARNING: Encountered a JMSException - resetting the underlying JMS Connection
com.ibm.msg.client.jms.DetailedJMSException: JMSWMQ1107: A problem with this connection has occurred.
An error has occurred with the IBM MQ JMS connection.
Caused by: com.ibm.mq.MQException: JMSCMQ0001: IBM MQ call failed with compcode '2' ('MQCC_FAILED') reason '2009' ('MQRC_CONNECTION_BROKEN').
    at com.ibm.msg.client.wmq.common.internal.Reason.createException(Reason.java:203)
    ... 220 more
Caused by: com.ibm.mq.jmqi.JmqiException: CC=2;RC=2009

At exactly the same timestamp, the JVM gets hung and DMLC is no longer processing the messages. But I do see the 20 listeners count on the consumer queue.

I took a thread dump of the process and I see hung/blocked threads which causes the JVM to be hung.

enter image description here

Here is the JMSCCThreadPoolWorker stacktrace which is waiting because of blocked thread.

JMSCCThreadPoolWorker-727742 
Stack Trace is: 
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000006d75964c0> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
at com.ibm.msg.client.jms.internal.JmsSessionImpl$ReentrantDoubleLock.getExclusiveLock(JmsSessionImpl.java:4931)
at com.ibm.msg.client.jms.internal.JmsSessionImpl.stop(JmsSessionImpl.java:2521)
at com.ibm.msg.client.jms.internal.JmsSessionImpl.stop(JmsSessionImpl.java:2498)
at com.ibm.msg.client.jms.internal.JmsConnectionImpl.stop(JmsConnectionImpl.java:1263)
- locked <0x00000006ca146118> (a com.ibm.msg.client.jms.internal.State)
at com.ibm.mq.jms.MQConnection.stop(MQConnection.java:473)
at org.springframework.jms.connection.SingleConnectionFactory.closeConnection(SingleConnectionFactory.java:452)
at org.springframework.jms.connection.SingleConnectionFactory.resetConnection(SingleConnectionFactory.java:345)
- locked <0x00000006cfba30c8> (a java.lang.Object)
at org.springframework.jms.connection.CachingConnectionFactory.resetConnection(CachingConnectionFactory.java:207)
at org.springframework.jms.connection.SingleConnectionFactory.onException(SingleConnectionFactory.java:323)
at org.springframework.jms.connection.SingleConnectionFactory$AggregatedExceptionListener.onException(SingleConnectionFactory.java:673)
- locked <0x00000006cfba30c8> (a java.lang.Object)
at com.ibm.msg.client.jms.internal.JmsProviderExceptionListener.run(JmsProviderExceptionListener.java:413)
at com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.runTask(WorkQueueItem.java:319)
at com.ibm.msg.client.commonservices.workqueue.SimpleWorkQueueItem.runItem(SimpleWorkQueueItem.java:99)
at com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.run(WorkQueueItem.java:343)
at com.ibm.msg.client.commonservices.workqueue.WorkQueueManager.runWorkQueueItem(WorkQueueManager.java:312)
at com.ibm.msg.client.commonservices.j2se.workqueue.WorkQueueManagerImplementation$ThreadPoolWorker.run(WorkQueueManagerImplementation.java:1227)
Locked ownable synchronizers:
- <0x00000006ca05eb40> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

Here is the stacktrace of blocked thread:

Stack Trace is: 
java.lang.Thread.State: BLOCKED (on object monitor)
at org.springframework.jms.connection.SingleConnectionFactory.getConnection(SingleConnectionFactory.java:281)
- waiting to lock <0x00000006cfba30c8> (a java.lang.Object)
at org.springframework.jms.connection.SingleConnectionFactory.createConnection(SingleConnectionFactory.java:224)
at org.springframework.jms.connection.JmsTransactionManager.createConnection(JmsTransactionManager.java:288)
at org.springframework.jms.connection.JmsTransactionManager.doBegin(JmsTransactionManager.java:186)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.handleExistingTransaction(AbstractPlatformTransactionManager.java:429)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:349)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130)
at org.apache.camel.spring.spi.TransactionErrorHandler.doInTransactionTemplate(TransactionErrorHandler.java:176)
at org.apache.camel.spring.spi.TransactionErrorHandler.processInTransaction(TransactionErrorHandler.java:136)
at org.apache.camel.spring.spi.TransactionErrorHandler.process(TransactionErrorHandler.java:105)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:172)
at org.apache.camel.processor.DelegateAsyncProcessor.process(DelegateAsyncProcessor.java:97)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:172)
at org.apache.camel.component.direct.DirectProducer.process(DirectProducer.java:62)
at org.apache.camel.processor.SendProcessor.process(SendProcessor.java:145)
at org.apache.camel.management.InstrumentationProcessor.process(InstrumentationProcessor.java:77)
at org.apache.camel.processor.interceptor.TraceInterceptor.process(TraceInterceptor.java:163)
at org.apache.camel.processor.DelegateAsyncProcessor.process(DelegateAsyncProcessor.java:97)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:172)
at org.apache.camel.processor.Pipeline.process(Pipeline.java:120)
at org.apache.camel.processor.Pipeline.process(Pipeline.java:83)
at org.apache.camel.processor.FatalFallbackErrorHandler.process(FatalFallbackErrorHandler.java:81)
at org.apache.camel.processor.RedeliveryErrorHandler.deliverToFailureProcessor(RedeliveryErrorHandler.java:1057)
at org.apache.camel.processor.RedeliveryErrorHandler.process(RedeliveryErrorHandler.java:468)
at org.apache.camel.spring.spi.TransactionErrorHandler.processByErrorHandler(TransactionErrorHandler.java:220)
at org.apache.camel.spring.spi.TransactionErrorHandler$1.doInTransactionWithoutResult(TransactionErrorHandler.java:183)
at org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:34)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133)
at org.apache.camel.spring.spi.TransactionErrorHandler.doInTransactionTemplate(TransactionErrorHandler.java:176)
at org.apache.camel.spring.spi.TransactionErrorHandler.processInTransaction(TransactionErrorHandler.java:136)
at org.apache.camel.spring.spi.TransactionErrorHandler.process(TransactionErrorHandler.java:105)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:172)
at org.apache.camel.processor.DelegateAsyncProcessor.process(DelegateAsyncProcessor.java:97)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:172)
at org.apache.camel.component.direct.DirectProducer.process(DirectProducer.java:62)
at org.apache.camel.processor.SendProcessor.process(SendProcessor.java:145)
at org.apache.camel.management.InstrumentationProcessor.process(InstrumentationProcessor.java:77)
at org.apache.camel.processor.interceptor.TraceInterceptor.process(TraceInterceptor.java:163)
at org.apache.camel.processor.RedeliveryErrorHandler.process(RedeliveryErrorHandler.java:542)
at org.apache.camel.spring.spi.TransactionErrorHandler.processByErrorHandler(TransactionErrorHandler.java:220)
at org.apache.camel.spring.spi.TransactionErrorHandler$1.doInTransactionWithoutResult(TransactionErrorHandler.java:183)
at org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:34)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133)
at org.apache.camel.spring.spi.TransactionErrorHandler.doInTransactionTemplate(TransactionErrorHandler.java:176)
at org.apache.camel.spring.spi.TransactionErrorHandler.processInTransaction(TransactionErrorHandler.java:136)
at org.apache.camel.spring.spi.TransactionErrorHandler.process(TransactionErrorHandler.java:105)
at org.apache.camel.spring.spi.TransactionErrorHandler.process(TransactionErrorHandler.java:114)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:197)
at org.apache.camel.processor.Pipeline.process(Pipeline.java:120)
at org.apache.camel.processor.Pipeline.process(Pipeline.java:83)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:197)
at org.apache.camel.processor.DelegateAsyncProcessor.process(DelegateAsyncProcessor.java:97)
at org.apache.camel.component.jms.EndpointMessageListener.onMessage(EndpointMessageListener.java:112)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:721)
at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:681)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:651)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:317)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:235)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1166)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1060)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- <0x00000006d0b19600> (a java.util.concurrent.ThreadPoolExecutor$Worker)
Gary Russell
  • 166,535
  • 14
  • 146
  • 179
Selvakumar
  • 189
  • 1
  • 13
  • You don't mention which version of IBM MQ jar files you are using, if they are anything less than the current maintenance level of MQ v8 or MQ v9 you should upgrade. Google "IBM MQ v8 fixlist" or "IBM MQ v9 fixlist" to see brief descriptions of issues fixed at the levels from yours to the current, I recall various hang conditions have been addressed. – JoshMc Jul 19 '18 at 06:26
  • I am using the latest version of IBM MQ jars available in Fix central - 9.0.4.0 – Selvakumar Jul 19 '18 at 14:41
  • 1
    There is 9.0.5 and 9.0.0.4 that both were released after 9.0.4. – JoshMc Jul 19 '18 at 14:49
  • I found the version 9.0.5 in Fix central. Looks like the version 9.0.5 addressed some hung issues. http://www-01.ibm.com/support/docview.wss?uid=swg1IT22127 and interestingly this issue happens only with shared conversation of more than 1 (we have a shared conv of 10) – Selvakumar Jul 19 '18 at 16:41

1 Answers1

2

You need to figure out which thread owns ReentrantLock < 0x00000006d75964c0>; it won't show in the stack trace.

It's probably a deadlock in the IBM client - either a thread has failed to unlock it or they have a lock order problem.

You might look for a thread that is blocked trying to synchronize on

- waiting to lock <0x00000006ca146118> (a com.ibm.msg.client.jms.internal.State)

It could be that that thread has already acquired the ReentrantLock. If no such thread exists then it's likely the former case (failure to unlock).

In any case, the deadlock appears to be in the IBM code. Tough to debug since (last time I looked) the MQ client is closed-source. You might need to open a ticket with IBM; assuming you have support there.

Gary Russell
  • 166,535
  • 14
  • 146
  • 179
  • 2
    I would add, try upgrading to a later fix pack version as the problem may already have been fixed. – Tim McCormick Jul 19 '18 at 14:12
  • @Gary - I did work with IBM on this issue. They pointed out that the framework is trying to reconnect every 5 secs. So they asked me to increase the reconnect interval above the heartbeat interval of MQ server (30 secs). I did make that change with no success. – Selvakumar Jul 19 '18 at 14:52