I am trying to figure out why our Oracle 11g has been misbehaving. We are getting the following error in all our jobs which access the database with JDBC:
Error message: java.sql.SQLRecoverableException: IO Error: Connection reset
Before anyone gets upset, yes, I have checked the other related links on stack overflow (and goodness only knows how many other sites). Nothing so far has helped and I'm wondering what I could try next.
The situation has worsened over the past few months and the only thing which has changed is that the jobs which access the database are running in docker containers. The only affected database is the oracle 11g (we have several other databases including oracle 12c). I can access the 11g database using sqlplus without difficulty.
Since this is a connectivity thing, I naturally checked the files tnsnames.ora
and listener.ora
. Here is what they contain (the database in question is SID=TMF).
tnsnames.ora:
GLOB =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = linuxoracle11)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = GLOB)
)
)
STATIC =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = linuxoracle11)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = STATIC)
)
)
TMF =
(DESCRIPTION =
(SDU=2048)
(TDU=2048)
(ADDRESS = (PROTOCOL = TCP)(HOST = linuxoracle11)(PORT = 1521))
(CONNECT_DATA =
(SERVER = SHARED)
(SERVICE_NAME = TMF)
)
)
DEMO =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = linuxoracle11)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = DEMO)
)
)
HONDADB =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = linuxoracle11)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = HONDADB)
)
)
TEMP =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = linuxoracle11)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = TEMP)
)
)
EXTPROC_CONNECTION_DATA =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1))
)
(CONNECT_DATA =
(SID = PLSExtProc)
(PRESENTATION = RO)
)
)
listener.ora:
SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(GLOBAL_DBNAME = TMF)
(ORACLE_HOME = /home/oracle/app/oracle/product/11.2.0/dbhome_1)
(SID_NAME = TMF)
)
(SID_DESC =
(GLOBAL_DBNAME = STATIC)
(ORACLE_HOME = /home/oracle/app/oracle/product/11.2.0/dbhome_1)
(SID_NAME = STATIC)
)
(SID_DESC =
(GLOBAL_DBNAME = DEMO)
(ORACLE_HOME = /home/oracle/app/oracle/product/11.2.0/dbhome_1)
(SID_NAME = DEMO)
)
)
LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = linuxoracle11)(PORT = 1521))
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
)
)
For the life of me I cannot see what is wrong. The listener trace log has reams of entries like these:
09-JUN-2020 18:25:21 * service_died * TMF * 12537
09-JUN-2020 18:25:22 * service_register * TMF * 0
09-JUN-2020 18:25:27 * service_died * TMF * 12537
09-JUN-2020 18:25:28 * service_register * TMF * 0
Tue Jun 09 18:25:33 2020
09-JUN-2020 18:25:33 * service_died * TMF * 12537
09-JUN-2020 18:25:34 * service_register * TMF * 0
09-JUN-2020 18:25:39 * service_died * TMF * 12537
09-JUN-2020 18:25:40 * service_register * TMF * 0
Tue Jun 09 18:25:45 2020
09-JUN-2020 18:25:45 * service_died * TMF * 12537
09-JUN-2020 18:25:46 * service_register * TMF * 0
09-JUN-2020 18:25:51 * service_died * TMF * 12537
09-JUN-2020 18:25:52 * service_register * TMF * 0
Tue Jun 09 18:25:57 2020
09-JUN-2020 18:25:57 * service_died * TMF * 12537
09-JUN-2020 18:25:58 * service_register * TMF * 0
09-JUN-2020 18:26:03 * service_died * TMF * 12537
09-JUN-2020 18:26:04 * service_register * TMF * 0
My most recent attempt to fix this is a slight variation on the information in this oracle community post. The setup on the server:
[root@linuxoracle11 trace]# mv /dev/random /dev/xrandom
[root@linuxoracle11 trace]# ln -s /dev/urandom /dev/random
[root@linuxoracle11 trace]# ls -l /dev/*ndom
lrwxrwxrwx 1 root root 12 Jun 9 19:26 /dev/random -> /dev/urandom
crw-rw-rw- 1 root root 1, 9 Jun 9 09:58 /dev/urandom
crw-rw-rw- 1 root root 1, 8 Jun 9 09:58 /dev/xrandom
This did not, however, solve the problem and the jobs running against the database are still aborting.
The most irritating aspect is that sometimes the jobs run through without error. I'd be grateful f anyone has a hint in which direction I should look.
Edit: As requested, here is the content of the sqlnet.ora file:
SQLNET.RECV_TIMEOUT=5
SQLNET.INBOUND_CONNECT_TIMEOUT=5
Yes, that is the default generated when the database system was installed.
There is also the suggestion that it might be due to a "logon storm". That is doubtful, since we have few actual users on the database. Mostly it's those jobs I wrote about running in docker containers. I checked the oracle doco and some info (again from a Burleson Consulting page) and tried the following:
SQL> select resource_name, current_utilization, limit_value
2 from v_$resource_limit
3 where resource_name in ('processes', 'sessions');
RESOURCE_NAME CURRENT_UTILIZATION LIMIT_VALU
------------------------------ ------------------- ----------
processes 25 200
sessions 30 322
So we're talking about a couple of dozen connections, not hundreds.
Edit2: For the sake of completeness, here is a part of the stack trace:
****** Jun 12, 2020 9:20:32 PM IO Error: Connection reset ******
java.sql.SQLRecoverableException: IO Error: Connection reset
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:752)
at oracle.jdbc.driver.PhysicalConnection.connect(PhysicalConnection.java:666)
at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:566)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
...
Caused by: java.net.SocketException: Connection reset
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:115)
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
at oracle.net.ns.DataPacket.send(DataPacket.java:209)
at oracle.net.ns.NetOutputStream.flush(NetOutputStream.java:215)
at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:302)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:249)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:171)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:89)
at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:123)
at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:79)
at oracle.jdbc.driver.T4CMAREngineStream.unmarshalUB1(T4CMAREngineStream.java:429)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:397)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:257)
at oracle.jdbc.driver.T4CTTIoauthenticate.doOAUTH(T4CTTIoauthenticate.java:437)
at oracle.jdbc.driver.T4CTTIoauthenticate.doOAUTH(T4CTTIoauthenticate.java:954)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:639)
... 38 more