0

We have a java spring integration application running on aws (multiple pods within a Kubernetes cluster). We use TCP Outbound gateways to communicate with third party systems and cache these connections using a CachingClientConnectionFactory factory. On the factory we have set the sokeepalive as true however we still see that after 350 seconds the connection is dropped. Do we need anythign else in the configuration to keep pinging the server a little before 350 seconds of idle waiting time ? AWS talks about the 350s restriction here - https://docs.aws.amazon.com/vpc/latest/userguide/nat-gateway-troubleshooting.html#nat-gateway-troubleshooting-timeout

Configuration of our connection factory and gateway is as follows

@Bean
    public AbstractClientConnectionFactory primeClientConnectionFactory() {
        TcpNetClientConnectionFactory tcpNetClientConnectionFactory = new TcpNetClientConnectionFactory(host, port);

        tcpNetClientConnectionFactory.setDeserializer(new PrimeCustomStxHeaderLengthSerializer());
        tcpNetClientConnectionFactory.setSerializer(new PrimeCustomStxHeaderLengthSerializer());
        tcpNetClientConnectionFactory.setSingleUse(false);
        tcpNetClientConnectionFactory.setSoKeepAlive(true);

        return tcpNetClientConnectionFactory;
    }

    @Bean
    public AbstractClientConnectionFactory primeTcpCachedClientConnectionFactory() {
        CachingClientConnectionFactory cachingConnFactory = new CachingClientConnectionFactory(primeClientConnectionFactory(), connectionPoolSize);
        //cachingConnFactory.setSingleUse(false);
        cachingConnFactory.setLeaveOpen(true);
        cachingConnFactory.setSoKeepAlive(true);
        return cachingConnFactory;
    }

    @Bean
    public MessageChannel primeOutboundChannel() {
        return new DirectChannel();
    }

    @Bean
    public RequestHandlerRetryAdvice retryAdvice() {
        RequestHandlerRetryAdvice retryAdvice = new RequestHandlerRetryAdvice();
        RetryTemplate retryTemplate = new RetryTemplate();
        FixedBackOffPolicy fixedBackOffPolicy = new FixedBackOffPolicy();
        fixedBackOffPolicy.setBackOffPeriod(500);
        SimpleRetryPolicy retryPolicy = new SimpleRetryPolicy();
        retryPolicy.setMaxAttempts(3);
        retryTemplate.setBackOffPolicy(fixedBackOffPolicy);
        retryTemplate.setRetryPolicy(retryPolicy);
        retryAdvice.setRetryTemplate(retryTemplate);
        return retryAdvice;
    }

    @Bean
    @ServiceActivator(inputChannel = "primeOutboundChannel")
    public MessageHandler primeOutbound(AbstractClientConnectionFactory primeTcpCachedClientConnectionFactory) {
        TcpOutboundGateway tcpOutboundGateway = new TcpOutboundGateway();
        List<Advice> list = new ArrayList<>();
        list.add(retryAdvice());
        tcpOutboundGateway.setAdviceChain(list);

        tcpOutboundGateway.setRemoteTimeout(timeOut);
        tcpOutboundGateway.setRequestTimeout(timeOut);
        tcpOutboundGateway.setSendTimeout(timeOut);
        tcpOutboundGateway.setConnectionFactory(primeTcpCachedClientConnectionFactory);
        return tcpOutboundGateway;
    }

}
Aakash
  • 15
  • 3
  • Are you sure the idle timeout isn't on the receiving end? – jordanm Aug 05 '22 at 22:44
  • we asked the server side and they mentioned they can support indefinitely open sockets, plus when a socket drops they see the below message at their end which seems to indicate the client (AWS or java app) dropped the connection. job BSOCR16401 on port 16401 re-submitted after error condition 0001. Cause . . . . . : TCP/IP job BSOCR16401 on port 16401 is being re-submitted after receiving an error condition of return code 0001 - read A connection with a remote socket was reset by that socket. – Aakash Aug 07 '22 at 11:05

1 Answers1

0

See this SO thread for more about Keep Alive: Does a TCP socket connection have a "keep alive"?.

According to current Java Net API we got this class:

/**
 * Defines extended socket options, beyond those defined in
 * {@link java.net.StandardSocketOptions}. These options may be platform
 * specific.
 *
 * @since 1.8
 */
public final class ExtendedSocketOptions {

Which provides this constant:

/**
 * Keep-Alive idle time.
 *
 * <p>
 * The value of this socket option is an {@code Integer} that is the number
 * of seconds of idle time before keep-alive initiates a probe. The socket
 * option is specific to stream-oriented sockets using the TCP/IP protocol.
 * The exact semantics of this socket option are system dependent.
 *
 * <p>
 * When the {@link java.net.StandardSocketOptions#SO_KEEPALIVE
 * SO_KEEPALIVE} option is enabled, TCP probes a connection that has been
 * idle for some amount of time. The default value for this idle period is
 * system dependent, but is typically 2 hours. The {@code TCP_KEEPIDLE}
 * option can be used to affect this value for a given socket.
 *
 * @since 11
 */
public static final SocketOption<Integer> TCP_KEEPIDLE
        = new ExtSocketOption<Integer>("TCP_KEEPIDLE", Integer.class);

So, what we need on the TcpNetClientConnectionFactory is this:

public void setTcpSocketSupport(TcpSocketSupport tcpSocketSupport) {

Implement that void postProcessSocket(Socket socket); to be able to do this:

            try {
                socket.setOption(ExtendedSocketOptions.TCP_KEEPIDLE, 349);
            }
            catch (IOException ex) {
                throw new UncheckedIOException(ex);
            }

According to that AWS doc you have shared with us.

See also some info in Spring Integration docs: https://docs.spring.io/spring-integration/docs/current/reference/html/ip.html#the-tcpsocketsupport-strategy-interface

Artem Bilan
  • 113,505
  • 11
  • 91
  • 118
  • 1
    This resolved the issue, our application is now able to ping every 340 seconds and keep the connection alive. – Aakash Aug 15 '22 at 19:20