26

I have between 1000-2000 webpages to download from one server, and I am using go routines and channels to achieve a high efficiency. The problem is that every time I run my program up to 400 requests fail with the error "connection reset by peer". Rarely (maybe 1 out of 10 times), no requests fail.

What can I do to prevent this?

One thing that is interesting is that when I ran this program on a server in the same country as the server the website is hosted in, 0 requests failed, so I am guessing there is some problem with delay (as it is now running on a server on a different continent).

The code I am using is basically just a simple http.Get(url) request, no extra parameters or a custom client.

fgblomqvist
  • 2,244
  • 2
  • 32
  • 44
  • Are all or a large portion of the pages coming from the same server? what is the max number of requests you're making concurrently? – JimB Jun 12 '16 at 14:46
  • All pages are from the same server (edited the question to reflect this). I am not sure how many are made concurrently. I just start as many go routines as there are web pages to download and then let the CPU/Golang impose the limits on concurrency. – fgblomqvist Jun 12 '16 at 21:12
  • There are no defined limits on concurrency, you need to do that yourself. – JimB Jun 12 '16 at 21:22

4 Answers4

37

The message connection reset by peer indicates that the remote server sent an RST to forcefully close the connection, either deliberately as a mechanism to limit connections, or as a result of a lack of resources. Either way you are likely opening too many connections, or reconnecting too fast.

Starting 1000-2000 connections in parallel is rarely the most efficient way to download that many pages, especially if most or all are coming from a single server. If you test the throughput you will find an optimal concurrency level that is far lower.

You will also want to set the Transport.MaxIdleConnsPerHost to match your level of concurrency. If MaxIdleConnsPerHost is lower than the expected number of concurrent connections, the server connections will often be closed after a request, only to be immediately opened again -- this will slow your progress significantly and possibly reach connection limits imposed by the server.

BaCaRoZzo
  • 7,502
  • 6
  • 51
  • 82
JimB
  • 104,193
  • 13
  • 262
  • 255
  • 3
    This is a great answer. I ended up doing some measurements on how many simultaneous connections gave the best performance, and for this connection I am currently on, that came out to be about 50, more connections than that gave very little to no extra performance. I limited the amount of go routines running to max 50, and set the MaxIdleConnsPerHost to 50. Works every time now! – fgblomqvist Jun 14 '16 at 14:10
  • @AG1: what code are you looking for? The answer comes down to just setting `MaxIdleConnsPerHost` to equal the number of concurrent requests. – JimB Jul 27 '17 at 12:36
  • @JimB I added the code as an answer to make it more concrete. – AG1 Jul 27 '17 at 22:53
  • @AG1: you can see a more complete example of that in [this answer](https://stackoverflow.com/questions/39813587/go-client-program-generates-a-lot-a-sockets-in-time-wait-state/39834253#39834253) – JimB Jul 27 '17 at 22:55
  • @JimB: A lot of time hast passed but this is still relevant for me rn: Would it be possible to just try and make another request after receiving said error? (I understand that this might not be the best solution). Does ```client.Do()``` return an error if the connection is reset? I am not quite sure because it only seems to return errors which have status code 2XX. My initial approach was to just wait a bit and then try the same request again. Would this be a valid approach to error handling (In addition to implementing what you proposed in your answer)? – Mxngls Jun 07 '22 at 09:21
  • @Mxngls, that's entirely up to you. If you get an unexpected error, and you want to retry the request, then you can do that. – JimB Jun 07 '22 at 13:09
  • @JimB: Thanks for the quick reply! My question is more specifically asking if error handling of that kind is effective here. Looking at the docs from the http package I am not sure if the error comes from the ```client.Do()``` function which sends the request. – Mxngls Jun 07 '22 at 13:56
  • 1
    A network connection could be closed at any point, so you may get that from `Do()`, or while reading the response. It doesn't really matter though, networks are unreliable, and if you get an unexpected error and want to retry, that is a perfectly normal thing to do. – JimB Jun 07 '22 at 14:04
20

Still a golang newbie, hopefully this helps.

var netClient = &http.Client{}

func init() {
    tr := &http.Transport{
        MaxIdleConns:       20,
        MaxIdleConnsPerHost:  20,
    }
    netClient = &http.Client{Transport: tr}
}

func foo() {
    resp, err := netClient.Get("http://www.example.com/")
}
AG1
  • 6,648
  • 8
  • 40
  • 57
5

I had good results by setting the MaxConnsPerHost option on transport...

cl := &http.Client{
    Transport: &http.Transport{MaxConnsPerHost: 50}
}

MaxConnsPerHost optionally limits the total number of connections per host, including connections in the dialing, active, and idle states. On limit violation, dials will block.

https://golang.org/pkg/net/http/#Transport.MaxConnsPerHost

EDIT: To clarify, this option was released in Go 1.11 which was not available at the time of @AG1's or @JimB's answers above, hence me posting this.

JamesHalsall
  • 13,224
  • 4
  • 41
  • 66
  • This is basically the same solution that @AG1 posted over 2 years ago.... – fgblomqvist Aug 27 '19 at 13:52
  • 5
    it's not, read my answer carefully, AG1 used `MaxIdleConnsPerHost` which did not work for me, `MaxConnsPerHost` was introduced in Go 1.11 (released in November 2018) which was not even released when AG1 posted his answer... – JamesHalsall Aug 27 '19 at 14:44
  • 2
    Apologies, read your answer a little too quickly. Nontheless, thanks for the clarification, will certainly help future readers. – fgblomqvist Aug 27 '19 at 15:15
  • How can I set different proxies for every request in this way ? Is it possible ? – Amir Khoshhal Jun 22 '20 at 15:11
0

It might be possible that the server from which you are downloading the webpages has some type of throttling mechanism which prevents more than a certain number of requests per second/(or similar) from a certain ip?. Try limiting to maybe 100 requests per second or adding sleep between requests. Connection reset by peer is basically server denying you service. (What does "connection reset by peer" mean?)

Community
  • 1
  • 1
  • Considering that everything runs fine when I run it on a server in the same country as the web server, it seemingly does not have such limits (unless they are only imposed on people from other countries, which does not make a lot of sense in my scenario). However, I will look into limiting the amounts of requests per second. – fgblomqvist Jun 12 '16 at 21:14
  • Generally servers can only handle a certain number of concurrent requests, and you might be past that capacity. A reason it would run fine from the same country is that the request would probably take significantly less time, so the connection isn't used up as long and the server can handle more. – robbrit Jun 13 '16 at 02:54
  • @robbrit I'm guessing that's probably the case. I will have to implement a connection pool I think. – fgblomqvist Jun 13 '16 at 11:58
  • @fgblomqvist: you don't need a connection pool, the http.Transport already does that for you. Just limit the concurrency, and set `Transport.MaxIdleConnsPerHost` to match your max concurrency. – JimB Jun 13 '16 at 12:45
  • @JimB Wanna expand on that? I don't understand how setting the MaxIdleConnsPerHost will limit the max open connections to the host? Also, why would I need to limit the concurrency as well? If I start 1000 go routines, all making one GET request each, they will open ~1000 connections, whether they share an HTTP client or not – fgblomqvist Jun 14 '16 at 08:05