3

I have been fighting with this issue for ages and I cannot for the life of me figure out what the problem is. Let me set the stage for the stack we are using:

  • Web-based Java 8 application
  • GWT
  • Hibernate 4.3.11
  • MySQL
  • MongoDB
  • Spring
  • Tomcat 8 (incl Tomcat connection pooling instead of C3PO, for example)
  • Hibernate Search / Lucene
  • Terracotta and EhCache

The problem is that every couple of days (sometimes every second day, sometimes once every 10 days, it varies) in the early hours of the morning, our application "locks up". To clarify, it does not crash, you just cannot log in or do anything for that matter. All background tasks - everything - just halt. If we attempt to login when it is in this state, we can see in our log file that it is authenticating us as a valid user, but no response is ever sent so the application just "spins".

The only pattern we have found to date related to when these "lock ups" occur is that it happens when our morning scheduled tasks or SAP imports are running. It is not always the same process that is running though, sometimes the lock up happens during one of our SAP imports and sometimes during internal scheduled task execution. All that these things have in common are that they run outside of business hours (between 1am and 6am) and that they are quite intensive processes.

We are using JavaMelody for monitoring and what we see every time is that starting at different times in this 1 - 6am window, the number of used jdbc connections just start to spike (as per the attached image). Once that starts, it is just a matter of time before the lock up occurs and the only way to solve it is to bounce Tomcat thereby restarting the application.

As for as I can tell, memory, CPU, etc, are all fine when the lock up occurs the only thing that looks like it has an issue is the constantly increasing number of used jdbc connections.

I have checked the code for our transaction management so many times to ensure that transactions are being closed off correctly (the transaction management code is quite old fashioned: explicit begin and commit in try block, rollback in catch blocks and entity manager close in a finally block). It all seems correct to me so I am really, really stumped. In addition to this, I have also recently explicitly configured the Hibernate connection release mode properly to after_transaction, but the issue still occurs.

The other weird thing is that we run several instances of the same application for different clients and this issue only happens regularly for one client. They are our client with by far the most data to be processed though and although all clients run these scheduled tasks, this big client is the only one with SAP imports. That is why I originally thought that the SAP imports were the issue, but it locked up just after 1am this morning and that was a couple hours before the imports even start running. In this case it locked up during an internal scheduled task executing.

Does anyone have any idea what could be causing this strange behavior? I have looked into everything I can think of but to no avail.

enter image description here

brent777
  • 3,369
  • 1
  • 26
  • 35
  • You say the transactions are being closed off, but are you closing the entity managers? You also need to close them in order to release the connection back to the pool. – coladict Oct 10 '16 at 09:03
  • @coladict I don't think that's true because, as I mentioned, I set the connection release mode to after transaction so it should release them on commit / rollback as far as I understand it from the docs. Anyway, it is a moot point since I did say in my question that I close the entity manager in a finally block. – brent777 Oct 10 '16 at 09:18
  • 1
    Well, there is the "Opened jdbc connections" link down in the "System information" section of the monitoring page to help you find any leaks. If there are any open connections, you will get the stacktrace of where the connection was created from, the peak of which will likely be somewhere in Hibernate, but you just have to go down until you reach your classes. – coladict Oct 10 '16 at 10:44
  • @coladict I can't see the history of that info though. When the application is locked up I cannot access /monitoring and once I have bounced it then that "real time" data is gone and I can only access the historical data shown on the graphs. Well at least as far as I know, unless you know a way I can get the data you are referring to after I have bounced the app / Tomcat. – brent777 Oct 10 '16 at 11:34
  • try to get thread dump using jconsole or visualvm or any other command tools like jmap or jstack. Then analyze the thread dump using thread stack analyzer(you can do it using text editor also) and see who is holding the connections. – Sangram Jadhav Oct 10 '16 at 11:52
  • @SangramJadhav I have tried that before, timing is a bit of an issue though since it happens during the early hours and you never know which night it is going to happen. I might to need try set something up where it detects a lock up and takes a thread dump automatically for me. – brent777 Oct 11 '16 at 06:18

2 Answers2

0

After some time and a lot of trial and error, my team and I managed to sort out this issue. Turns out that the spike in JDBC connections was not the cause of the lock-ups but was instead a consequence of the lock-ups. Apache Terracotta was the culprit. It was just becoming unresponsive it seems. It might have been a resource allocation issue, but I don't think so since this was happening on servers that were low usage as well and they had more than enough resources available.

Fortunately we actually no longer needed Terracotta so I removed it. As I said in the question, we were getting these lock-ups every couples of days - at least once per week, every week. Since removing it we have had no such lock-ups for 4 months and counting. So if anyone else experiences the same issue and you are using Terracotta, try dropping it and things might come right, as they did in my case.

brent777
  • 3,369
  • 1
  • 26
  • 35
-1

As said by coladict, you need to look at "Opened jdbc connections" page in the javamelody monitoring report and before the server "locks up".

Sorry if you need to do that at 2h or 3h in the morning, but perhaps you can run a wget command automatically in the night.

evernat
  • 1,713
  • 13
  • 18