1

We have a multi-threaded java application with a Web UI and REST API that is compiled using java 6 and runs in tomcat 6. During operations it uses OJDBC to access its Oracle DB millions of times a day. Once every two or three months one of the DB queries hangs and never returns, which causes part of the application to stop processing and a backlog to get created. Other threads are able to communicate with the DB and do their work, only one thread gets hung, which unfortunately stops file processing.

A thread dump shows that the thread is reading from a socket which never times out nor gets closed:

"FileUpload" daemon prio=10 tid=0x00002b8e60617800 nid=0xf9e runnable [0x00002b8e5e10b000]
java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:129)
    at oracle.net.ns.Packet.receive(Packet.java:311)
    at oracle.net.ns.DataPacket.receive(DataPacket.java:103)
    at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:312)
    at oracle.net.ns.NetInputStream.read(NetInputStream.java:257)
    at oracle.net.ns.NetInputStream.read(NetInputStream.java:182)
    at oracle.net.ns.NetInputStream.read(NetInputStream.java:99)
    at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:121)
    at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:77)
    at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1173)
    at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:309)
    at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:200)
    at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:543)
    at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:238)
    at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:1244)
    at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1492)
    at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1710)
    at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:4372)
    at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:4453)
- locked <0x00002b8e1c2d7010> (a oracle.jdbc.driver.T4CConnection)
    at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:6270)
    at org.apache.tomcat.dbcp.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
    at org.apache.tomcat.dbcp.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
    at xxx.OracleFileInfoDAO.getFilesToUpload(OracleFileInfoDAO.java:874)

When this happens the DBAs have looked on the DB server and do not see a long running query. The solution is to recycle tomcat which resolves the issue but I like to find out if there is a programmatic way to handle this. I have seen appends that refer to similar issues that are resolved by recycling the LINUX box that the DB server is running on, but that will not be an option for us; I need a fix at the application level.

The DB resource is defined using:

<Resource auth="Container" description="Oracle Datasource" name="xxx" scope="shareable" type="javax.sql.DataSource" url="jdbc:oracle:thin:@xxx:1521/xxx"  driverClassName="oracle.jdbc.driver.OracleDriver" username="xxx" password="xxx" maxWait="5000" maxActive="100" maxIdle="20" removeAbandoned="true" testOnReturn="true" testOnBorrow="true" validationQuery="select 1 from dual" />   

OJDBC driver used is: ojdbc6_g-11.2.0.4.0.jar

The DB version is: 11.2.0.3.0

The java code performing the query is:

                con = CSAConnectionManager.getConnection();                     
            StringBuilder strBuf = new StringBuilder(SQL_SELECT_FILE_INFO_TO_UPLOAD);
            ps = con.prepareStatement(strBuf.toString());
            ps.setString( 1, hostname );
            ps.setString( 2, containerId );
            ps.setMaxRows( maxRows );

            Date before = new Date();
            ResultSet rs = ps.executeQuery();

This is the source for getConnection():

    public static Connection getConnection() throws Exception
{
    return instance.getInstanceConnection();
}

public Connection getInstanceConnection() throws Exception
{
    Connection con = null;
    if(ds != null)
    {
        con = ds.getConnection();
    }
    else
    {
        String dburl = wrapper.getDBUrl();
        String username = wrapper.getDBUserName();
        String password = wrapper.getDBPassword();      
        String driverClass = wrapper.getDBDriverClass();
        Class.forName(driverClass).newInstance();

        con = DriverManager.getConnection(dburl,username,password);
    }       
    con.setAutoCommit(false);
    return con;
}

“ds” is defined as: private static DataSource ds = null; And is initialized using:

        Context initContext = new InitialContext();
        ds = (DataSource)initContext.lookup(wrapper.getCSADBJNDIName());
user3722575
  • 71
  • 1
  • 2
  • 8

2 Answers2

2

In my experience, this is typically a network error. Your query has completed, but your client is still blocking on a network response it will never receive. This is why bouncing the app server works as it resets everything in the app server, but bouncing the DB server makes no sense since it is not a DB issue. Take a look this question/answer on this site...

Question on network timeouts

Community
  • 1
  • 1
unleashed
  • 771
  • 3
  • 9
  • I agree @unleashed, possible network error, but could also be slow client side processing or also too small an array fetch size which would cause too many network round trips ["SQL*Net is a chatty protocol"]. In this case, bouncing the app server kills the offending idle session which is the bottleneck. – Roger Cornejo May 12 '17 at 13:29
  • Thank you both for the replies and the references, very useful. The network error seems much more plausible in this case. As the code above shows, tomcat DB resource is being used. Is there a way to define a timeout in tomcat instead of resorting to use DriverManager? – user3722575 May 15 '17 at 17:26
1

Check to see if the session is:

  1. inactive - if inactive, what is the wait state, it may be waiting for client or network.
  2. there are any blocking locks involving the idle session (e.g. it's holding some locks on uncommitted transactions).

Also check for connection storms (i.e. too many sessions hitting the instance can cause serious CPU issues)

Roger Cornejo
  • 1,507
  • 1
  • 8
  • 7
  • The DBA does not see the session/query on the server. Thread dump from my app shows: "FileUpload" daemon prio=10 tid=0x00002af6d42c8000 nid=0x369 runnable [0x00002af6d87fe000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) I am very certain that it is not a connection storm issue. I added these two to tomcat resource definition but see no log entry as a result: validationQueryTimeout="3600" suspectTimeout="60" Hoping for any input – user3722575 Jun 02 '17 at 15:55
  • Full resource definition: – user3722575 Jun 02 '17 at 16:28