3

One of the threads has a lock for more than 3 seconds when querying Oracle Database. This causes many blocked threads when accesing Oracle database, and hence sudden increases in number of threads and unresposiveness of application. Im am using Tomcat 8.5, Tomcat connection pool, Java 8. Trace for blocking thread:

***"http-nio-80-exec-433" #4207 daemon prio=5 os_prio=0 tid=0x00007fd9d8042000 nid=0x503b runnable [0x00007fd839f04000]
   java.lang.Thread.State: RUNNABLE
    at java.util.Hashtable.get(Hashtable.java:363)
    - locked <0x000000070193caa0> (a java.util.Hashtable)
    at java.lang.ConditionalSpecialCasing.lookUpTable(ConditionalSpecialCasing.java:151)
    at java.lang.ConditionalSpecialCasing.toUpperCaseEx(ConditionalSpecialCasing.java:123)
    at java.lang.String.toUpperCase(String.java:2775)
    at java.lang.String.toUpperCase(String.java:2833)
    at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1638)
    at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:4401)
    at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:4482)
    - locked <0x000000074cd7d868> (a oracle.jdbc.driver.T4CConnection)
    at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:6272)
    at sun.reflect.GeneratedMethodAccessor400.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.tomcat.jdbc.pool.interceptor.AbstractQueryReport$StatementProxy.invoke(AbstractQueryReport.java:210)
    at com.sun.proxy.$Proxy637.executeQuery(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor400.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.tomcat.jdbc.pool.StatementFacade$StatementProxy.invoke(StatementFacade.java:114)
    at com.sun.proxy.$Proxy637.executeQuery(Unknown Source)
    at org.hibernate.jdbc.AbstractBatcher.getResultSet(AbstractBatcher.java:208)
    at org.hibernate.loader.Loader.getResultSet(Loader.java:1953)
    at org.hibernate.loader.Loader.doQuery(Loader.java:802)
    at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:274)
    at org.hibernate.loader.Loader.doList(Loader.java:2533)
    at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2276)
    at org.hibernate.loader.Loader.list(Loader.java:2271)***

Here is a trace for one of the 10+ BLOCKED threads

***"http-nio-80-exec-271" #2777 daemon prio=5 os_prio=0 tid=0x00007fd9c8941800 nid=0x19c3 waiting for monitor entry [0x00007fd8356ca000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at java.util.Hashtable.get(Hashtable.java:363)
    - waiting to lock <0x000000070193caa0> (a java.util.Hashtable)
    at java.lang.ConditionalSpecialCasing.lookUpTable(ConditionalSpecialCasing.java:151)
    at java.lang.ConditionalSpecialCasing.toUpperCaseEx(ConditionalSpecialCasing.java:123)
    at java.lang.String.toUpperCase(String.java:2775)
    at java.lang.String.toUpperCase(String.java:2833)
    at oracle.jdbc.driver.CharCommonAccessor.init(CharCommonAccessor.java:164)
    at oracle.jdbc.driver.VarcharAccessor.<init>(VarcharAccessor.java:88)
    at oracle.jdbc.driver.T4CVarcharAccessor.<init>(T4CVarcharAccessor.java:108)
    at oracle.jdbc.driver.T4CTTIdcb.fillupAccessors(T4CTTIdcb.java:431)
    at oracle.jdbc.driver.T4CTTIdcb.receiveCommon(T4CTTIdcb.java:209)
    at oracle.jdbc.driver.T4CTTIdcb.receive(T4CTTIdcb.java:145)
    at oracle.jdbc.driver.T4C8Oall.readDCB(T4C8Oall.java:963)
    at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:447)
    at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:235)
    at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:543)
    at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:239)
    at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:1246)
    at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1500)
    at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1717)
    at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:4401)
    at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:4482)
    - locked <0x000000074d203f60> (a oracle.jdbc.driver.T4CConnection)
    at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:6272)
    at sun.reflect.GeneratedMethodAccessor400.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.tomcat.jdbc.pool.interceptor.AbstractQueryReport$StatementProxy.invoke(AbstractQueryReport.java:210)
    at com.sun.proxy.$Proxy637.executeQuery(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor400.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.tomcat.jdbc.pool.StatementFacade$StatementProxy.invoke(StatementFacade.java:114)
    at com.sun.proxy.$Proxy637.executeQuery(Unknown Source)
    at org.hibernate.jdbc.AbstractBatcher.getResultSet(AbstractBatcher.java:208)***

I have no idea why toUpperCase() would lock something (it is some Integer object being locked, as far as i found) for 30+ seconds, but this keeps occuring multiple times per day. Thread dump analysers did not found any deadlocks in dump. Tomcat pools logs that query for blocking thread http-nio-80-exec-433 took 5 minutes to complete.

Could this be a problem with jvm, memory or something else? Like jdbc driver or connection pool configuration problem?

Vaidotas
  • 175
  • 1
  • 1
  • 11
  • Could you please post the block of code where the blocking happens? Usually it would block due to a database call and the lock on toUpperCase is just a symptom. – fpezzini Nov 26 '19 at 13:02
  • You're looking here at IO (DB, network) calls that could block or blocks of code within synchronized blocks/methods, or maybe a Semaphore or one of the other more specialized threading classes. For instance if your database is set for row locking in serialized mode your code will have to wait anytime some other user of the application is querying that same row/table. – fpezzini Nov 26 '19 at 13:04
  • DB admin sometimes notices that there are BLOCKED oracle sessions when this happens, and says that there is a commit missing somewhere, and hence blocking select statements. I find it weird how missing commit could block sessions, as I would expect something like Optimistic lock exception in this case. removeabandonedconnection in pool is set to 1 hour, and neither rollbackonReturn or commit on return are set. – Vaidotas Nov 26 '19 at 13:15
  • It's a shame because *it appears that* there is no reason for a lock there - an apparently effectively immutable `Hashtable` might as well be a `HashMap`. It looks like it gets locked frequently but not for long. I don't know HotSpot or other JVMs options well enough to know if there is a better configuration for extremely frequently used locks. Even if your SQL query strings were upper case, it still gets used a lot. (You are using the no-arg `executeQuery`, right?) – Tom Hawtin - tackline Nov 26 '19 at 13:30
  • What is the full version of the Java you're using? – Mark Rotteveel Nov 26 '19 at 13:50
  • Java(TM) SE Runtime Environment 1.8.0_181-b13 Java HotSpot(TM) 64-Bit Server VM) – Vaidotas Nov 26 '19 at 14:02
  • Now I see that it is not clear if it was 30 seconds - I launched kill-3 command 5 times over 30 seconds, but in log it show that only 3 seconds have passed - probably because of some lag. But nevertheless, time is too long. – Vaidotas Nov 26 '19 at 14:05
  • select was (column and table names have been changed) SQL=select * from ( select count(*) as col_0_0_ from NUMBERS_VW eknumb0_, EK_DEPARTMENTS_VW ekdepart1_ where eknumb0_.DEPARTMENT_ID=ekdepart1_.IDD_CODE and 1=1 and (upper(eknumb0_.SERIES) like upper(?)) and (upper(ekdepart1_.IDD_NAME) like upper(?)) and (eknumb0_.PA_NUMBER is null) ) where rownum <= ?; time=301772 ms; – Vaidotas Nov 26 '19 at 14:15
  • Could it be a portion of the code that is not rolling back the transaction when an exception happens? That would match what the DB admin is seeing. This is very likely Oracle blocking and the Java code is left waiting. – fpezzini Nov 26 '19 at 14:37
  • There's further information on `ConditionalSpecialCasing.entrtyTable` lock contention in another question. I've added a +500 bounty to that post if anybody wants the rep. https://stackoverflow.com/q/31987777/4725 – Tom Hawtin - tackline Nov 26 '19 at 16:18
  • 2
    I'm betting your default Locale is "lt" (Lithuanian). It's one of three languages that require special casing rules, the others being "tr" (Turkish) and "az" (Azrbaijani). For other locales ConditionalSpecialCasing is not used. So unless you need the locale specific casing you could simply force your default Locale to something else and avoid this lock contention. – bowmore Nov 27 '19 at 13:53
  • You are correct on Locale. What is strange that looking at source code of ConditionalSpecialCasing , none of the special letters are actually Lithuanian letters, and why only these 3 languages have special treatment remains a mystery. Changing whole locale could possibly fix the bug, but may introduce a lot of extra work in other parts of the system. – Vaidotas Nov 27 '19 at 14:29

1 Answers1

1

It appears the problem was not code related. We had 10GB size catalina.out log file, and 4 bash scripts which analyzed that file for specific errors every five minutes, and because of large file size, each such analysis (mostly tail/wc commands) took 3-4 minutes. I do not know if catalina.out was being locked, but CPU usage for "tail" and "wc" commands was quite significant. Memory usage did not increase significantly though.

After manually rolling catalina.out, the problem is gone. Admin has been tasked with figuring out why logrotate is failing.

Update: The problem kept reappearing under higher loads (>50 users), so after some testing, locale was changed from "lt" to "en". Together with fixing another MyFaces cache bug, response times from Tomcat decreased 10-20 times, and number of concurrent users that can use application increased >10 times.

Vaidotas
  • 175
  • 1
  • 1
  • 11