3

im running the following bit of code, which just connects and closes a socket in an infinite loop:

import java.net.InetAddress;
import java.net.InetSocketAddress;
import java.nio.channels.SocketChannel;
import java.nio.channels.spi.SelectorProvider;

public class Main {
    public static void main(String[] args) throws Exception {
        Thread.sleep(1000);
        InetAddress localhost = InetAddress.getByName("127.0.0.1");
        InetSocketAddress localhostRpcbind = new InetSocketAddress(localhost, 111);
        SelectorProvider selectorProvider = SelectorProvider.provider();
        long iterations = 0;
        while (true) {
            try {
                SocketChannel socketChannel = selectorProvider.openSocketChannel();
                socketChannel.connect(localhostRpcbind);
                socketChannel.finishConnect();
                socketChannel.close();
                iterations ++;
            } catch (Exception e) {
                System.err.println("after " + iterations + " iterations");
                e.printStackTrace(System.err);
                throw e;
            }
        }
    }
}

port 111 is the port for rpcbind (which is up and running on my machine). on the 1st run of the code i'll get something like:

after 28239 iterations
java.net.BindException: Cannot assign requested address
    at sun.nio.ch.Net.connect0(Native Method)
    at sun.nio.ch.Net.connect(Net.java:458)
    at sun.nio.ch.Net.connect(Net.java:450)
    at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
    at Main.main(Main.java:16)

subsequent runs will fail immediately (0 iterations), until after a while i get the 1st result again (~26-28k iterations then a failure).

whats going on and how can i get this connect/disconnect loop to properly run indefinitely?

im running on linux x64 (fedora 22).

note: yes, i know the code is useless and does nothing, this is a SSCCE of a bigger issue i'm trying to investigate.

UPDATE - looks like im running into ephemeral port exhaustion on my machine:

$ cat /proc/sys/net/ipv4/ip_local_port_range
32768   61000

so i have ~28k ephemeral ports to use for connections, which matches up with my error

radai
  • 23,949
  • 10
  • 71
  • 115

1 Answers1

4

The system has run out of ephemeral ports to bind your socket, after having bound to around 28K different ports.

The reason of this problem is that for opening a TCP connection, the operating system allocates an ephemeral port (for the source port). It binds the socket to the allocated port. After the TCP connection is closed, the connection is left in TIME_WAIT state, typically for 2 minutes, due to historical reasons (https://en.wikipedia.org/wiki/File:Tcp_state_diagram_fixed_new.svg) and from my point of view this time can be reduced in most of today's systems ... topic of other discussion.

As one solution you can reduce this timeout with sysctl:

Change the value of net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait to a low number like 1 and situation will improve. However is your application is fast enough to consume ~28K ports in less than 1 sec you will still see this exception.

Other TCP parameters you can tune: Increase the range of ephemeral ports: net.ipv4.ip_local_port_range net.ipv4.tcp_tw_reuse net.ipv4.tcp_tw_recycle

Look at: http://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux.html, http://www.lognormal.com/blog/2012/09/27/linux-tcpip-tuning/

rodolk
  • 5,606
  • 3
  • 28
  • 34