12

I have a Go program that generates a lot of HTTP requests from multiple goroutines. after running for a while, the program spits out an error: connect: cannot assign requested address.

When checking with netstat, I get a high number (28229) of connections in TIME_WAIT.

The high number of TIME_WAIT sockets happens when I the number of goroutines is 3 and is severe enough to cause a crash when it is 5.

I run Ubuntu 14.4 under docker and go version 1.7

This is the Go program.

package main

import (
        "io/ioutil"
        "log"
        "net/http"
        "sync"
)
var wg sync.WaitGroup
var url="http://172.17.0.9:3000/";
const num_coroutines=5;
const num_request_per_coroutine=100000
func get_page(){
        response, err := http.Get(url)
        if err != nil {
                log.Fatal(err)
        } else {
                defer response.Body.Close()
                _, err =ioutil.ReadAll(response.Body)
                if err != nil {
                        log.Fatal(err)
                }
        }

}
func get_pages(){
        defer wg.Done()
        for i := 0; i < num_request_per_coroutine; i++{
                get_page();
        }
}

func main() {
        for i:=0;i<num_coroutines;i++{
                wg.Add(1)
                go get_pages()
        }
        wg.Wait()
}

This is the server program:

package main

import (
    "fmt"
    "net/http"
    "log"
)
var count int;
func sayhelloName(w http.ResponseWriter, r *http.Request) {
    count++;
    fmt.Fprintf(w,"Hello World, count is %d",count) // send data to client side
}

func main() {
    http.HandleFunc("/", sayhelloName) // set router
    err := http.ListenAndServe(":3000", nil) // set listen port
    if err != nil {
        log.Fatal("ListenAndServe: ", err)
    }
}
wasmup
  • 14,541
  • 6
  • 42
  • 58
yigal
  • 3,923
  • 8
  • 37
  • 59
  • 2
    TIME_WAIT is the normal TCP state after closing closing a connection. What exactly are you trying to test here? – JimB Oct 02 '16 at 10:36
  • JimB, I am tring to stress test the web server http://172.17.0.9:3000/ and I want to do it using just one client machine. I know that this is possible because there are no problems if I set num_coroutines to 2. but I want to use many coroutines – yigal Oct 02 '16 at 14:04
  • You're opening and closing connections too fast for your server. Is the server you're testing expected to reuse http/1.1 connections, or does it close the connection on every request? – JimB Oct 03 '16 at 12:55
  • JimB, the server program is very simple - I added to the question. I dont think it is using keep alive connections. – yigal Oct 03 '16 at 14:27
  • No, the server is using http/1.1 by default. The problem is partly because the server is too simple and not really doing any work, and benchmarking a "hello world" doesn't prove anything since the client is being tested just as much as the server, with conflating issues from the OS and network stack. (also see https://stackoverflow.com/questions/30352725). – JimB Oct 03 '16 at 14:33
  • Adding a bit more context to "TIME_WAIT is the normal TCP state after closing a connection": https://serverfault.com/a/23395/117206 – Brent Bradburn Jul 26 '19 at 18:15

1 Answers1

30

The default http.Transport is opening and closing connections too quickly. Since all connections are to the same host:port combination, you need to increase MaxIdleConnsPerHost to match your value for num_coroutines. Otherwise, the transport will frequently close the extra connections, only to have them reopened immediately.

You can set this globally on the default transport:

http.DefaultTransport.(*http.Transport).MaxIdleConnsPerHost = numCoroutines

Or when creating your own transport

t := &http.Transport{
    Proxy: http.ProxyFromEnvironment,
    DialContext: (&net.Dialer{
        Timeout:   30 * time.Second,
        KeepAlive: 30 * time.Second,
    }).DialContext,
    MaxIdleConnsPerHost:   numCoroutines,
    MaxIdleConns:          100,
    IdleConnTimeout:       90 * time.Second,
    TLSHandshakeTimeout:   10 * time.Second,
    ExpectContinueTimeout: 1 * time.Second,
}

Similar question: Go http.Get, concurrency, and "Connection reset by peer"

JimB
  • 104,193
  • 13
  • 262
  • 255
  • 1
    JimB, I used the first option above and it greatly improved the behavior of the program. now it does not crash on low numbers of num_conections, but I does break for high numbers (for example 10000). I'll try the more more verbose option and see it helps more. – yigal Oct 04 '16 at 05:17
  • 1
    @yigal: of course it will break if you raise concurrency high enough, what is the point of testing 10000 concurrent connections with a single http client and server over loopback? You only have so many file descriptors and ephemeral ports you can make use of without some system tuning and better configuration. – JimB Oct 04 '16 at 12:30
  • 2
    the idea is to stress test our system using just one client machine. The advantage of single client machine over multiple client machines is that is should be simpler to develop and test the stress test code. I am trying out golang for this purpose as it is a fast language with low overhead for spawning threads/coroutines. I am not completely versed at linux optimizations, but my logic says that 10000 concurrent connections should be achievable with stock Linux. I just need to handle the TIME_WAIT problem more efficiently. – yigal Oct 04 '16 at 19:34
  • Any difference when we do a POST request ? – James Sapam Dec 20 '16 at 00:21
  • How can I set different proxies for every request in this way ? Is it possible ? – Amir Khoshhal Jun 22 '20 at 15:16