6

We have a problem in production for quite a while now...

This is a follow up of: my other question but with way more specifics so I think posting it as a new question is justified (if not I'll just add this info to the other question).

Here it goes:

So, we have a file descriptor leak on aix with (ibm) java 6 weblogic using an application using cxf and we address a web service of our own an also a jsb which routes towards our ws.

When using File Leak Detector as agent in the startup of weblogic and dumping the getCurrentOpenFiles() and filtering on Listener.SocketRecord programatically we have 2000+ open sockets;

These are java sockets and file descriptors, the net-sockets (viewed with netstat) are properly closed, by time, but the programmatic ones (and the ones viewed with

lsof -p $pid_of_managed_server 2> /dev/null|grep TCP|wc -l

)stay open (and eventually cause a too many open files issue)).

This is the head of the stack of one of those open file descriptors inside the jvm:

record socket to tst-cjcsr.com/10.239.7.19:443 by thread:[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)' on Thu Nov 10 10:25:56 CET 2016
    at java.net.PlainSocketImpl.create(PlainSocketImpl.java:188)
    at java.net.Socket.createImpl(Socket.java:411)
    at java.net.Socket.connect(Socket.java:544)
    at weblogic.net.http.HttpsClient.openWrappedSSLSocket(HttpsClient.java:565)
    at weblogic.net.http.HttpsClient.openServer(HttpsClient.java:296)
    at weblogic.net.http.HttpsClient.openServer(HttpsClient.java:373)
    at weblogic.net.http.HttpsClient.New(HttpsClient.java:528)
    at weblogic.net.http.HttpsURLConnection.connect(HttpsURLConnection.java:239)
    at weblogic.net.http.HttpURLConnection.getInputStream(HttpURLConnection.java:409)
    at weblogic.net.http.SOAPHttpsURLConnection.getInputStream(SOAPHttpsURLConnection.java:37)
    at weblogic.net.http.HttpURLConnection.getResponseCode(HttpURLConnection.java:1038)
    at org.apache.cxf.transport.http.URLConnectionHTTPConduit$URLConnectionWrappedOutputStream.getResponseCode(URLConnectionHTTPConduit.java:266)
    at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.doProcessResponseCode(HTTPConduit.java:1550)
    at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.handleResponseInternal(HTTPConduit.java:1579)
    at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.handleResponse(HTTPConduit.java:1520)
    at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.close(HTTPConduit.java:1317)
    at org.apache.cxf.transport.AbstractConduit.close(AbstractConduit.java:56)
    at org.apache.cxf.transport.http.HTTPConduit.close(HTTPConduit.java:632)
    at org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:62)
    at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:272)
    at org.apache.cxf.endpoint.ClientImpl.doInvoke(ClientImpl.java:572)
    at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:481)
    at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:382)
    at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:335)
    at org.apache.cxf.frontend.ClientProxy.invokeSync(ClientProxy.java:96)
    at org.apache.cxf.jaxws.JaxWsClientProxy.invoke(JaxWsClientProxy.java:136)
    at com.sun.proxy.$Proxy380.requestOurRecord(Unknown Source)
    at some.package.application.dossier.DossierBean$1.call(DossierBean.java:225)

I can imagine that this is a applicative issue because I can't find other cases online with similar defects.

Does anyone can make more sense from this stack?

For instance: It strikes me as odd that HTTPConduit.close() wants to create a connection...

Another point: the issue doesn't occur with the same stack of technologies for non-https calls. (which does makes sense because the stacktrace mentions HttpsClient)

CXF version: 2.7.18

cxf config:

<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:jaxws="http://cxf.apache.org/jaxws"
       xmlns:soap="http://cxf.apache.org/bindings/soap"
       xmlns:cxf="http://cxf.apache.org/core"
       xmlns:context="http://www.springframework.org/schema/context"
       xsi:schemaLocation="
            http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
            http://cxf.apache.org/core http://cxf.apache.org/schemas/core.xsd
            http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.1.xsd
            http://cxf.apache.org/jaxws http://cxf.apache.org/schemas/jaxws.xsd">
    <import resource="classpath:META-INF/cxf/cxf.xml"/>

    <context:property-placeholder location="file:${config.file.location}cjr-extract-${environment}.properties"/>

    <jaxws:client id="OurService"
                  serviceClass="some.package.ws.extract.generated.OurService"
                  address="${ws.extract.wsdl.endpoint}">
        <jaxws:binding>
            <soap:soapBinding version="1.2"/>
        </jaxws:binding>
    </jaxws:client>

    <cxf:bus>
        <cxf:outInterceptors>
            <bean class="our.package.interceptors.SomeInterceptor"
                  id="webSecurityInterceptor">
                <constructor-arg>
                    <map>
                        <entry key="action" value="Timestamp Signature"/>
                        <entry key="user" value="${org.apache.ws.security.crypto.merlin.keystore.alias}"/>
                        <entry key="passwordCallbackRef">
                            <ref bean="passwordCallBack"/>
                        </entry>
                        <!--entry key="signaturePropFile" value="properties/our.properties"/-->
                        <entry key="signaturePropFile" value="file:${location}/our.properties"/>
                        <entry key="signatureKeyIdentifier" value="DirectReference" />
                    </map>
                </constructor-arg>
            </bean>
        </cxf:outInterceptors>
        <cxf:properties>
            <entry key="signatureParts"
                   value="{Element}{http://www.w3.org/2003/05/soap-envelope}Body;{Element}{http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd}Timestamp"/>
        </cxf:properties>
    </cxf:bus>

    <bean id="passwordCallBack" class="our.package.authentication.PasswordCallbackHandler">
        <property name="password" value="${password}"/>
        <property name="alias" value="${org.apache.ws.security.crypto.merlin.keystore.alias}"/>
    </bean>
</beans>

Java:

java version "1.6.0"
Java(TM) SE Runtime Environment (build pap3260sr15fp1-20140110_01(SR15 FP1))
IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 AIX ppc-32 jvmap3260sr15-
20131231_180656 (JIT enabled, AOT enabled)
J9VM - 20131231_180656
JIT  - r9_20130920_46510ifx3
GC   - GA24_Java6_SR15_20131231_1152_B180656)
JCL  - 20140107_01

Weblogic 10.3.6

Thanks for hints, pointers, and, of course - if possible -, answers ;-)

S.

EJP requested code, here it is:

The call itself:

private OurService ourservice;

private Callable<RequestOurRecordResponse> callService(final RequestOurRecordRequest request) {
    return new Callable<RequestOurRecordResponse>() {
        @Override
        public RequestOurRecordResponse call() throws Exception {
            try {
                return ourService.requestOurRecord(request); // this is line 225.
            } catch (Exception e) {
                facesMessages.error("technicalError");
                log.error("Encountered technical error", e);
                return null;
            }
        }
    };
}

The generated webservice:

package some.package.ws.extract.generated;

import javax.jws.WebMethod;
import javax.jws.WebParam;
import javax.jws.WebResult;
import javax.jws.WebService;
import javax.jws.soap.SOAPBinding;
import javax.xml.bind.annotation.XmlSeeAlso;


/**
 * This class was generated by the JAX-WS RI.
 * JAX-WS RI 2.1.7-b01-
 * Generated source version: 2.1
 * 
 */
@WebService(name = "OurService", targetNamespace = "http://secret/service-v1.0-rc2")
@SOAPBinding(parameterStyle = SOAPBinding.ParameterStyle.BARE)
@XmlSeeAlso({
    ObjectFactory.class
})
public interface OurService {
    /**
     * 
     * @param requestOurRecordRequest
     * @return
     *     returns some.package.ws.extract.generated.RequestOurRecordResponse
     */
    @WebMethod(action = "http://secret/service/RequestOurRecord")
    @WebResult(name = "requestOurRecordResponse", targetNamespace = "http://secret/service-v1.0-rc2", partName = "requestOurRecordResponse")
    public RequestOurRecordResponse requestOurRecord(
        @WebParam(name = "requestOurRecordRequest", targetNamespace = "http://secret/service-v1.0-rc2", partName = "requestOurRecordRequest")
        RequestOurRecordRequest requestOurRecordRequest);
}

with:

<http-conf:conduit name="*.http-conduit">
    <http-conf:client Connection="Close" />
</http-conf:conduit>

The connections (os level) close; but we still have the leak (behavior stated above).

with:

<http-conf:conduit name="*.http-conduit">
    <http-conf:client Connection="Keep-Alive" />
</http-conf:conduit>

Both the connections (os level) as the file descriptors keep going up...

(additional info: the os level connections go down (after the time out I guess) but the file descriptors stay open...)

SanThee
  • 2,301
  • 3
  • 23
  • 35
  • @EJP The problem is in the configuration of CXF. This is why there is no code. I can't go and paste CXF's code: we just don't know what's going on internally there. You want the code that calls? Fine, that's no issue, will be done in a few minutes, but it won't give you any more hints. – Olivier Grégoire Nov 18 '16 at 10:05
  • @EJP: All of the open file descriptors seem to be stuck at java.net.PlainSocketImpl.create(PlainSocketImpl.java:188); we can search for this code in the OpenJdk but not in the IBM one (which causes the issue) furthermore; the stacktrace uses a lot of weblogic internal code, which is called by cxf, which we call, but our code is limited to the code Olivier provided (with the config I provided) – SanThee Nov 18 '16 at 10:24
  • have you tried timeouts? If you didnt check yet i found this issue http://stackoverflow.com/questions/5656458/java-net-socketexception-too-many-open-files/37605213#37605213 – HRgiger Nov 18 '16 at 10:28
  • Thanks @HRgiger we've already increased the open file limit. It only delays the issue to later in time, and we can't go much higher. We'll definitely check the TIME_WAIT settings. – Olivier Grégoire Nov 18 '16 at 10:34
  • @HRgiger: we did: the tcp_timewait (it's a aix machine) is set to 1 -> this means 15 seconds; it's not the network connections that leak but the file descriptors associated with them; the network connections get closed properly but the file descriptors remain and augment... thanks for the suggestion though... – SanThee Nov 18 '16 at 10:46
  • well at least i tried:) he mentioned "socket connections are treated like files and they use file descriptor, which is a limited resource" that is why I thought it might worth to try. – HRgiger Nov 18 '16 at 10:49

1 Answers1

1

From my perspective the problem is related to cxf configuration, perhaps interplay of weblogic + cxf is also an issue, because it's the cxf component that manages the connection (see the stacktrace). So I suggested trying with the different keep-alive settings (how to set keep-alive is described here). Seems like this didn't help beat open file descriptors though.

UPDATE:

After thinking more about your problem:

at org.apache.cxf.transport.http.URLConnectionHTTPConduit$URLConnectionWrappedOutputStream.getResponseCode(URLConnectionHTTPConduit.java:266)
at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.doProcessResponseCode(HTTPConduit.java:1550)
at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.handleResponseInternal(HTTPConduit.java:1579)
at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.handleResponse(HTTPConduit.java:1520)
at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.close(HTTPConduit.java:1317)

seems like when closing http connection the cxf tries to get some response code back and creates a socket for that, and as those sockets just wait for response that never comes indefinitely that creates a leak. The whole flow is described here under a Simplified Client Workflow section. So, the next question is, does tst-cjcsr.just.fgov.be ever reply? And, my next suggestion is to configure ReceiveTimeout for the http-conf:client as described in the link above.

borowis
  • 1,207
  • 10
  • 17
  • How is this different from the comments of HRgiger? Bamboomy already stated that the connection is closed. It's only the file descriptor that's left open. – Olivier Grégoire Nov 24 '16 at 14:16
  • 1
    from my perspective the problem is related to cxf configuration, perhaps interplay of weblogic + cxf is also an issue. by setting keep-alive to false I hope that the cxf will explicitly close the sockets and therefore file descriptors will be released. And Bamboomy stated that the programmatically open sockets stay open if I understood correctly – borowis Nov 24 '16 at 14:21
  • 1
    It makes sense to me, I'll give it a spin and let you know, this will be around tomorrow morning; thanks for the suggestion (and for thinking out of the box ;) ) – SanThee Nov 24 '16 at 15:23
  • 1
    I've edited the question, both "Keep-Alive" as "close" as value for "Connection" suffer from the problem... – SanThee Nov 25 '16 at 12:39
  • We're actively trying get your insights working in our code before I can't award the bounty anymore, without success so far. – Olivier Grégoire Nov 25 '16 at 15:50
  • @Bamboomy, Olivier, hey guys, now it seems to me like the server you're querying for criminal records is never replying, and I don't know if it's true but at least it looks like it from the stacktrace. I updated the answer to include that. – borowis Nov 27 '16 at 18:05
  • @BorysZibrov: Hey, the code which calls the webservice was shown in my original question http://stackoverflow.com/questions/40401064/weblogic-doesnt-close-files-but-does-close-connections but adapting the ReceiveTimeout will not help, I assume; from the docs I read that the default timeout is 60.000; however, the file descriptors don't get closed after 60 seconds – SanThee Nov 28 '16 at 09:11
  • yes, I've read that as well, but you can see in your stacktrace that sockets are waiting for response: `at weblogic.net.http.HttpURLConnection.getResponseCode(HttpURLConnection.java:1038)` – borowis Nov 28 '16 at 11:05