0

We are using solace as message bus between modules and subsystems. Our application is built on Spring Boot and Spring integrations(message-driven-channel-adapter, DefaultMessageListenerContainer, CachingConnectionFactory).

We are observing random slowness of 10-15 mins interval happening once in few days. In some cases based on logs there is only sender from module-1 to receiver from module-2 takes 15 mins and there is no service activator in between as well.

Anyone had similar issue? Any advice on trouble shooting this issue?

Ramprabhu
  • 195
  • 1
  • 13
  • I would suggest to turn on `DEBUG` logging level for the `org.springframework.integration` category. This way you will see a lot of `preSend` and `postSend` and some other Spring Integration useful logs. That should give you some clues where your system is stuck. Also be sure that everything is good with memory and GC. Use Visual VM on the matter. – Artem Bilan May 07 '18 at 14:32
  • Thank you Artem for the good suggestion, I will enable the debug logs for specific jms packages. For the memory size I have verified its well below max limit. I have verified the load as well during this time frame, which is negligible. – Ramprabhu May 07 '18 at 14:40
  • In our code we are using dynamic scaling with CachingConnectionFactory. That could be the potential issue as explained in this thread [link](https://stackoverflow.com/questions/21984319/why-defaultmessagelistenercontainer-should-not-use-cachingconnectionfactory) – Ramprabhu May 16 '18 at 16:38
  • Logs are confirming the same behavior as well. Say Listener-4 is used at 12:01 and again used at 12:17 in between its not being used. Seems like Listener-4 consumed the message but didn't really pass the message to the channel adapter because of caching in CachingConnectionFactory. – Ramprabhu May 16 '18 at 16:44

1 Answers1

0

This is really a good issue that helped me to understand few aspects of SI and JMS. I am writing this answer hoping that someone having similar issue gets some info. There were more than one issue in our scenario, Below are the highlights:

  • Removal of caching connection factory for DMLC solved the problem to a good extent.
  • We are using blocking flow from DMLC to the flow completion. Under certain scenario our service activator got stuck during persistence.
  • Our message broker is set with pre-fetch setting of 18, which allows up to 18 message to be delivered to the consumer without acknowledgement. In our case one message is stuck in our service activator the flow is blocked but the jms consumer keeps fetching up to 18 message without passing it to DMLC broker. We reduced the pre-fetch to 1, this helped to reduce the number of stuck message to only one.
  • We suspected whether mysql connection is being dropped under certain inactivity scenario as explained in this thread MySQL database drops connection after 8 hours. How to prevent it?. We have added datasource keep alive setting by adding validation query. This has helped us to reduce our stuck messages to zero.
Ramprabhu
  • 195
  • 1
  • 13