4

I have an Apache CXF 2.7.8 consumer calling another SOAP web service.

In my development environment (Tomcat 6.0, jdk1.7.0_51, Windows 7) everything works charmingly.

However; when I deploy the code to a test environment (WebLogic 12.1, jdk1.7.0_51, RHEL 6), I get a javax.xml.ws.WebServiceException: Could not send Message exception, caused by java.net.SocketTimeoutException: Read time out after x millis on every second request.

Both the development and test instances call the exact same server.

When I perform a network trace, I see that CXF sends many requests using the same socket connection (thanks to HTTP Keep-Alive). Eventually, the server sends a FIN indicating that the client should stop using this connection (and establish a new one, if need be). The client acknowledges the FIN, but then continues to send the next request on the same socket, despite having been told to disconnect (and acknowledging that directive). The server then sends an RST (as expected), telling the client to go away. The client then tries again. Eventually, enough time has elapsed that we reach the Read timeout, and the SocketTimeoutException above is thrown.

(As an aside: On the windows development platform, the client honors the FIN and establishes a new socket connection for the next request).

When I disable HTTP Keep-Alive (using the instructions here), the Server sends the FIN after only a single request is sent by the client (exactly as it should). The client still acknowledges the FIN with an ACK for that frame, and then boldly continues trying to use that socket.

I would love to have HTTP Keep-Alive working, but I would settle for the darn thing working without it.

Are there any recommended solutions or next steps for troubleshooting?

Jared
  • 25,520
  • 24
  • 79
  • 114
  • Maybe its related: http://stackoverflow.com/questions/5270981/webservice-java-net-sockettimeoutexception-read-timed-out ? – Jorge Campos Jul 31 '14 at 03:14
  • Software can't ignore a FIN and keep transmitting over the same socket. This must be a platform bug, or else your observations are awry. – user207421 Jul 31 '14 at 03:30
  • Turns out that it was (in my opinion) a platform bug (WebLogic failing to check to ensure pooled socket connections were still valid before using them). Fix (to make WebLogic do that check) in answer below. – Jared Aug 02 '15 at 19:59

2 Answers2

3

Here is what we've learned so far, in hopes that it helps someone else:

  1. In Oracle JRE 1.7.0_51-b13, when HTTP KeepAlive is enabled by users java.net.HttpURLConnection and javax.net.ssl.HttpsURLConnection (both subclasses of java.net.URLConnection), and the server terminates the HTTP connection (as it is allowed to do as per RFC), the JRE inappropriately continues to use the disconnected socket (when that socket is in what appears to be one of the FIN_WAIT states). In doing so, the JRE "waits" on the socket send (that will never return a response) until the read timeout occurs, when the socket timeout exception is thrown. This functionality works correctly in the IBM JRE under WebSphere, but not in the Oracle JRE under WebLogic.

  2. In the same Oracle JRE release, when the "Connection: " header field on an HTTP or HTTPS request is set to "close" (rather than Keep-Alive), as per RFC, subclasses of URLConnection continue to attempt to reuse the same underlying socket, even though they should not.

Our workaround is to set the "http.keepAlive" system property to "false" to disable keep alive on all connections. This is not an acceptable long-term workaround, as the additional time and resource overhead to setup and breakdown a connection on every request is unacceptable - we'll have to keep working on something to get this working correctly.

Jared
  • 25,520
  • 24
  • 79
  • 114
  • Hi, any follows on this issue, please? – Jakub May 31 '15 at 22:51
  • Are you sure the problem lies within the JRE implementation ? Under WebLogic, the HTTP stack is overriden to use a custom WebLogic HTTP stack (package weblogic.net.http.*)... This sounds similar to JIRA issue CXF-4524. – metatechbe Jun 07 '16 at 08:27
3

tldr; Add -Dhttp.keepalivecache.sockethealthchecktimeout=10 to the JVM arguments for the WebLogic server.

Here's what we learned eventually:

The client (Apache 2.7.8 on WebLogic 12c) was sending SOAP HTTP requests to the server (not a WebLogic server).

The server was (at least in some cases) failing to send a 'Connection' header in the response. This resulted in WebLogic not knowing whether it could reuse the connection or not. When it tried to reuse a connection that had been closed by the server, we got the error.

WebLogic has a parameter that can instruct it to perform a health check on the reused connection prior to reusing it, and evict it from the pool if it fails the health check. Setting the system property 'http.keepalivecache.sockethealthchecktimeout' to a very low value (say 10, for 10 milliseconds) fixed the problem.

Jared
  • 25,520
  • 24
  • 79
  • 114