8

I have a setup in golang which basically gets a free port from OS and then starts a http sever on it. It started to give random errors with port signup failures. I simplified it into the following program which seems to error after grabbing a few free ports. It happens very randomly and there is no real process running on the port it errors. Doesn't make sense at all to me on why this has to error. Any help would be appreciated.

Output of the program:

..
..
58479
..
..
58867
58868
58869
..
bound well! 58867
bound well! 58868
bound well! 58869
..
..
..
2015/04/28 09:05:09 Error while binding port: listen tcp :58479: bind: address already in use

I made sure to check that the free port that came out never repeated.

package main

 import (
    "net"
    "net/http"
    "log"
    "fmt"
)

func main() {

    for {
        l, _ := net.Listen("tcp", ":0")
        var port = l.Addr().String()[5:]
        l.Close()
        fmt.Println(port)
        go func() {
                l1, err := net.Listen("tcp", ":"+port)
                if (err != nil) {
                    log.Fatal("Error while binding port: ", err.Error())
                } else {
                    fmt.Println("bound well! ", port)
                }
                http.Serve(l1, nil)
            }()
    }
}
Aila
  • 319
  • 1
  • 4
  • 11
  • It might not matter at all but what OS are you using? Some OSs might have a limit on the amount of open sockets and with you could have hit it. For example see [this](https://support2.microsoft.com/default.aspx?scid=kb;en-us;196271) post's solution - apperantly on some versions of Windows the maximum number of ephemeral TCP ports is 5000. – Makpoc Apr 28 '15 at 17:09
  • if you get `address already in use`, it's in use. You need to determine why, not assume the OS is wrong. – JimB Apr 28 '15 at 17:51
  • JimB if the number of ports that can be used is limitted in some way and the OS keeps the sockets in TIME_WAIT he will see exactly this error. Aila, see if [the solutions here](http://stackoverflow.com/questions/5106674/error-address-already-in-use-while-binding-socket-with-address-but-the-port-num) help and also check if there are sockets in TIME_WAIT on the port that fails. – Makpoc Apr 28 '15 at 19:17

4 Answers4

11

What you do is checking whether the port is free at one point and then you try to use it basing on the fact it was free in the past. This is not going to work.

What happens: with every iteration of a for loop, you generate a port number and make sure it's free. Then you spawn a routine with intention of using this port (which is already released back to the pool of free ports). You don't really know when the routine kicks in. It might be activated while the main routine (the for loop) has just generated another free port – maybe the same one again? Or maybe another process has taken this port in a meantime. Essentially you can have a race condition on a single port.

After some more research:

There's a small caveat though. Two different sockets can be bound to the same ip+port as long as the local+remote pair is unique. I've once written a response about it. So when I've been creating listener with :0 I was able to get a "collision"; proved by netstat -an:

10.0.1.11.65245        *.*                    LISTEN
10.0.1.11.65245        17.143.164.101.5223    ESTABLISHED

Now, the thing is that if you want to explicitly bind the socket the port being used, this is not possible. Probably because you would be able to specify local address only and remote address wouldn't be known until call to listen or connect (we're talking about syscalls now, not the Go interface). In other words, when you leave port unspecified, OS has a wider choice. So if it happens you got a local address that is also being used by another socket, you're unable to bind to it manually.

How to sovle it:

As I've mentioned in the comments, your server process should be using :0 notation in order to be able to choose available resource from OS. Once it's listening, the address should be announced to interested processes. You can do it, for example, through a file or a standard output.

Community
  • 1
  • 1
tomasz
  • 12,574
  • 4
  • 43
  • 54
  • Thanks for you comment. What you said makes sense. But how do I avoid this? The port actually goes as an input param to one of the my binaries that starts up as a command. I need to determine the port before hand. But how can I avoid this race condition? I tried using mutex lock. But I still see the same issue. – Aila Apr 28 '15 at 18:31
  • @Aila, if you need to spawn a server as separate binary and know the (random) port in your main application, than the binary would need to start a server using port `0` and communicate the assigned port back. That's the only safe way I can think of. You could send it back on stdout or write it to some predefined file (if you spawn many binaries like this, you could attach a pid to the filename – this is something both child and parent processes know). – tomasz Apr 28 '15 at 18:45
  • Although, I agree on your prediction of a race condition. When I notice, the free ports that are generated, they are consistently incremental. and the one it is erroring was never a port that was generated before. Is possible there could be something else happening that Close() on listener did not happen on time before the next listener is opened on the same port? – Aila Apr 28 '15 at 20:25
  • @Aila, I can see the ports are incremental and they're unique _within your process_, but there're also other processes creating sockets. You can't be sure OS won't give out port which you have just _released_. Because there's a period of time when the port is free to use, you will always have a risk of failing to bind the server. I wouldn't dig deeper into the problem, just implement it without the race condition. – tomasz Apr 29 '15 at 13:20
  • 1
    @Aila, I've updated the response with some more information explaining why you might be seeing it. Unfortunately, you still don't have a choice but do it properly :) – tomasz Apr 29 '15 at 15:18
11

Firstly I check the port:

$ lsof -i :8080

The results are:

COMMAND     PID USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
WeChat     1527 wonz  241u  IPv4 0xbfe135d7b32e86f1      0t0  TCP wonzdembp:63849->116.128.133.101:http-alt (ESTABLISHED)
__debug_b 41009 wonz    4u  IPv6 0xbfe135e149b1e659      0t0  TCP *:http-alt (LISTEN)

So I kill PID:

$ kill 41009

Then it works.

Wonz
  • 267
  • 5
  • 7
1

It is possible you were previously running or debugging an application on this port, and it did not shut down cleanly. The process might still be hanging out in your system's memory. Kill that process completely, and any other network daemons that might be lurking in the shadows, and try to run your application again.

If you haven't checked for this already, you can use (if using Linux) top, htop, or any GUI system monitor like Windows' Task Manager, Gnome3's System Monitor or KDE's KSysGuard to search for the offending process.

For an example, I have observed that Visual Studio Code's debugger/runner utility (F5/Ctrl+F5) does not always clean up the process, especially if you hit F5 too quickly and the old debugger did not shut down.

Ryan Alex Martin
  • 163
  • 2
  • 11
0

Use reuseport to afftectively use the port for listen.

"github.com/libp2p/go-reuseport"

l, err := reuseport.Listen("tcp", ":"+strconv.Itoa(tcpPort))

instead of

l1, err := net.Listen("tcp", ":"+port)