10

I am building a tool in Go that needs to make a very large number of simultaneous HTTP requests to many different servers. My initial prototype in Python had no problem doing a few hundred simultaneous requests.

However, I have found that in Go this almost always results in a Get http://www.google.com: dial tcp 216.58.205.228:80: i/o timeout for some if the number of simultaneous requests exceeds ~30-40.

I've tested on macOS, openSUSE, different hardware, in different networks and with different domain lists, and changing the DNS server as described in other Stackoverflow answers does not work either.

The interesting thing is that the failed requests do not even produce a packet, as can be seen when checking with Wireshark.

Is there anything that I am doing wrong or is that a bug in Go?

Minimum reproducible program below:

package main

import (
    "fmt"
    "net/http"
    "sync"
)

func main() {
    domains := []string{/* large domain list here, eg from https://moz.com/top500 */}

    limiter := make(chan string, 50) // Limits simultaneous requests

    wg := sync.WaitGroup{} // Needed to not prematurely exit before all requests have been finished

    for i, domain := range domains {
        wg.Add(1)
        limiter <- domain

        go func(i int, domain string) {
            defer func() { <-limiter }()
            defer wg.Done()

            resp, err := http.Get("http://"+domain)
            if err != nil {
                fmt.Printf("%d %s failed: %s\n", i, domain, err)
                return
            }

            fmt.Printf("%d %s: %s\n", i, domain, resp.Status)
        }(i, domain)
    }

    wg.Wait()
}

Two particular error messages are happening, a net.DNSError that does not make any sense and a non-descript poll.TimeoutError:

&url.Error{Op:"Get", URL:"http://harvard.edu", Err:(*net.OpError)(0xc00022a460)}
&net.OpError{Op:"dial", Net:"tcp", Source:net.Addr(nil), Addr:net.Addr(nil), Err:(*net.DNSError)(0xc000aca200)}
&net.DNSError{Err:"no such host", Name:"harvard.edu", Server:"", IsTimeout:false, IsTemporary:false}

&url.Error{Op:"Get", URL:"http://latimes.com", Err:(*net.OpError)(0xc000d92730)}
&net.OpError{Op:"dial", Net:"tcp", Source:net.Addr(nil), Addr:net.Addr(nil), Err:(*poll.TimeoutError)(0x14779a0)}
&poll.TimeoutError{}

Update:

Running the requests with a seperate http.Client as well as http.Transport and net.Dialer does not make any difference as can be seen when running code from this playground.

Neverbolt
  • 111
  • 1
  • 1
  • 5
  • 1
    You are making all requests with http.DefaultClient. What happens when you distribute the requests over a few independent http clients? Perhaps the connection pool is limited to some number of connections. – Peter Aug 25 '18 at 14:05
  • reworked your code (https://play.golang.org/p/HnKdFG5roj-) and yes i also find some results rather suspicious. Not sure why it would not resolve web.mit.edu / fda.gov / geocities.jp / clickbank.net. However imho it is not related to concurrency rate. –  Aug 25 '18 at 15:25
  • Also found this along the road, `2018/08/25 17:24:53 Unsolicited response received on idle HTTP channel starting with "HTTP/1.0 408 Request Time-out\r\nServer: AkamaiGHost\r\nMime-Version: 1.0\r\nDate: Sat, 25 Aug 2018 15:24:53 GMT\r\nContent-Type: text/html\r\nContent-Length: 218\r\nExpires: Sat, 25 Aug 2018 15:24:53 GMT\r\n\r\n\nRequest Timeout\n\n

    Request Timeout

    \nThe server timed out while waiting for the browser's request.

    \nReference #2.3ff90a17.1535210693.0\n\n"; err=`

    –  Aug 25 '18 at 15:26
  • @Peter see the update, it does not make a difference – Neverbolt Aug 26 '18 at 14:51
  • @mh-cbon have you tried to lower the concurrency, because with ~5-10 concurrent requests its running without problems – Neverbolt Aug 26 '18 at 14:53
  • 1
    yes, it is very similar to my previous tests, 40 failures or so. Still some i dont quiet because dig resolves them. Even `googleusercontent.com` constantly fails. See also https://github.com/golang/go/issues/18588. I ran it on 1.10, i have not took time to switch to 1.11 yet, might worth the test. –  Aug 26 '18 at 16:52
  • @mh-cbon that issue seems pretty much like what is happening here, thank you – Neverbolt Aug 27 '18 at 12:39
  • @Neverbolt Hey did you solve your problem? – CriticalRebel Feb 27 '20 at 17:55
  • @CriticalRebel no I did not, I have reduced the amount of parallel requests and chose to work with multiple instances of the same program, which seems to point to the open file limit that is mentioned in the issue mh-cbon mentioned and is not yet resolved from a go standard library standpoint. – Neverbolt Mar 27 '20 at 09:57
  • @Neverbolt, there is a good chance that the DNS server is causing your bottleneck. Google [explicitly states](https://developers.google.com/speed/public-dns/docs/security#rate_limit) it will alter the queries per second per client if it thinks something odd is going on. I cannot imagine it is the only DNS provider that has this defensive measure built in. A way to test this is overriding the default [DNS Resolver](https://koraygocmen.medium.com/custom-dns-resolver-for-the-default-http-client-in-go-a1420db38a5d) to use a cache like (here)[https://stackoverflow.com/a/40252460/1987437]. – Liam Kelly May 25 '21 at 13:42
  • @LiamKelly As I said in the question, taking a python client to do the very same thing did not result in any performance issues, so I don't think that the DNS server is the bottleneck, as both were using the same server. – Neverbolt May 26 '21 at 14:06
  • 1
    @Neverbolt there is a good chance that the python code is just slower given the GIL. Surprised that there is not a DNS tool to measure QPS. Seem pretty straight forward to do in `gopacket` but probably even more useful to implement via `ebpf`. – Liam Kelly May 27 '21 at 13:27
  • Do not think that is a bug in go... have you seen https://github.com/codesenberg/bombardier ? – Fulldump Jul 07 '21 at 01:25
  • @Fulldump the tool you linked has nothing to do with resolving large numbers of domain names, so I don't think it applies here – Neverbolt Oct 27 '21 at 13:54

1 Answers1

1

I think many of your net.DNSErrors are actually too many open files errors in disguise. You can see this by running your sample code with the netgo tag (recommendation from here) (go run -tags netgo main.go) which will emit errors like:

…dial tcp: lookup buzzfeed.com on 192.168.1.1:53: dial udp 192.168.1.1:53: socket: too many open files

instead of

…dial tcp: lookup buzzfeed.com: no such host

Make sure you're closing the request's response body (resp.Body.Close()). You can find more about this specific problem at What's the best way to handle "too many open files"? and How to set ulimit -n from a golang program?. (On my machine (macOS), increasing file limits manually seemed to help, but I don't think it's a good solution since it doesn't really scale, and I'm not sure how many open files you'd need overall.)


As suggested by @liam-kelly, I think the i/o timeout error is coming from a DNS server or some other security mechanism. Setting a custom (bad) DNS server IP gives me the same error.

Cameron Little
  • 3,487
  • 23
  • 35