Is there a better way limit requests at the "door"?

Question

Right now I'm testing an extremely simple Semaphore in one of my production regions in AWS. On deployment the latency jumped from 150ms to 300ms. I assumed latency would occur, but if it could be dropped that would be great. This is a bit new to me so I'm experimenting. I've set the semaphore to allow 10000 connections. That's the same number as the maximum number of connections Redis is set to. Is the code below optimal? If not can someone help me optimize it, if I doing something wrong etc. I want to keep this as a piece of middleware so that I can simply call it like this in on the server n.UseHandler(wrappers.DoorMan(wrappers.DefaultHeaders(myRouter), 10000)).

package wrappers

import "net/http"

// DoorMan limit requests
func DoorMan(h http.Handler, n int) http.Handler {
    sema := make(chan struct{}, n)

    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        sema <- struct{}{}
        defer func() { <-sema }()

        h.ServeHTTP(w, r)
    })
}

That semaphore on its own won't add any noticeable latency to the requests. Have you checked that you actually have more than 10000 concurrent requests? If you are reaching a blocking state on that semaphore, my guess is that you were benefitting from the extra concurrency while previously only blocking around redis requests. — JimB, Mar 22 '17 at 16:54
AWS reports 30,000 requests per minute. I'm load balancing on two servers. My fear is when I move the code over to a the east coast where we peak at 300,000 per minute. Also most of my latency comes from our production mongo servers. Although we use redis where we can, when traffic increases mongo gets slammed and nationally latency spikes. So I'm following mongo (mgo) documentation and trying to limit at the "door". — reticentroot, Mar 22 '17 at 17:00
This is about as efficient as it could be, so you're going to have to do some better system profiling (avg latency basically tells you nothing useful). Blocking handlers could be causing a spike in active connections, you may want to try limiting network connections directly to see if that helps. — JimB, Mar 22 '17 at 17:06
30,000 request **per minute** is not equal to 10,000 concurrent / simultaneous requests (there is no direct correlation, it really depends on how long serving the requests take). — icza, Mar 22 '17 at 17:19
I agree @icza. I used that number because its the max number of concurrent connections redis can handle. So mentally i know it should never be more then that... still experimenting. — reticentroot, Mar 22 '17 at 17:21
Related: [Process Management for the Go Webserver](http://stackoverflow.com/questions/37529511/process-management-for-the-go-webserver/37531953#37531953). — icza, Mar 22 '17 at 17:21
If your limit is Redis, why don't you limit the connections to Redis then. Virtually all the libraries I've seen let you specify the maxium number of connections for the pool. If all connections are in use, the caller will just wait for one to become available. The number of concurrent requests seems unrelated to me. — Peter, Apr 11 '17 at 17:04
@peter that is sane, but will also increase request latency which is what OP tries to normalize/get down. I guess OP rather wants to drop requests with a 4xx code than to have them wait until redis/mongo deigns to respond. — RickyA, Aug 28 '17 at 10:03

score 2 · Accepted Answer · answered Sep 08 '17 at 16:00

The solution you outline has some issues. But first, let's take a small step back; there are two questions in this, one of them implied:

How do you rate limit inbound connections efficiently?
How do you prevent overloading a backend service with outbound connections?

What it sounds like you want to do is actually the second, to prevent too many requests from hitting Redis. I'll start by addressing the first one and then make some comments on the second.

Rate limiting inbound connections

If you really do want to rate limit inbound connections "at the door", you should normally never do that by waiting inside the handler. With your proposed solution, the service will keep accepting requests, which will queue up at the sema <- struct{}{} statement. If the load persists, it will eventually take down your service, either by running out of sockets, memory, or some other resource. Also note that if your request rate is approaching saturation of the semaphore, you would see an increase in latency caused by goroutines waiting at the semaphore before handling the request.

A better way to do it is to always respond as quickly as possible (especially when under heavy load). This can be done by sending a 503 Service Unavailable back to the client, or a smart load balancer, telling it to back off.

In your case, it could for example look like something along these lines:

select {
case sema <- struct{}{}:
    defer func() { <-sema }()
    h.ServeHTTP(w, r)
default:
    http.Error(w, "Overloaded", http.StatusServiceUnavailable)
}

Rate limiting outbound connections to a backend service

If the reason for the rate limit is to avoid overloading a backend service, what you typically want to do is rather to react to that service being overloaded and apply back pressure through the request chain.

In practical terms, this could mean something as simple as putting the same kind of semaphore logic as above in a wrapper protecting all calls to the backend, and return an error through your call chain of a request if the semaphore overflows.

Additionally, if the backend sends status codes like 503 (or equivalent), you should typically propagate that indication downwards in the same way, or resort to some other fallback behaviour for handling the incoming request.

You might also want to consider combining this with a circuit breaker, cutting off attempts to call the backend service quickly if it seems to be unresponsive or down.

Rate limiting by capping the number of concurrent or queued connection as above is usually a good way to handle overload. When the backend service is overloaded, requests will typically take longer, which will then reduce the effective number of requests per second. However, if, for some reason, you want to have a fixed limit on number of requests per second, you could do that with a rate.Limiter instead of a semaphore.

A comment on performance

The cost of sending and receiving trivial objects on a channel should be sub-microsecond. Even on a highly congested channel, it wouldn't be anywhere near 150 ms of additional latency only to synchronise with the channel. So, assuming the work done in the handler is otherwise the same, whatever your latency increase comes from it should almost certainly be associated with goroutines waiting somewhere (e.g. on I/O or to get access to synchronised regions that are blocked by other goroutines).

If you are getting incoming requests at a rate close to what can be handled with your set concurrency limit of 10000, or if you are getting spikes of requests, it is possible you would see such an increase in average latency stemming from goroutines in the wait queue on the channel.

Either way, this should be easily measurable; you could for example trace timestamps at certain points in the handling pathway. I would do this on a sample (e.g. 0.1%) of all requests to avoid having the log output affect the performance.

score 1 · Answer 2 · answered Aug 28 '17 at 07:16

1

I'd use a slightly different mechanism for this, probably a worker pool as described here:

https://gobyexample.com/worker-pools

I'd actually say keep 10000 goroutines running, (they'll be sleeping waiting to receive on a blocking channel, so it's not really a waste of resources), and send the request+response to the pool as they come in.

If you want a timeout that responds with an error when the pool is full you could implement that with a select block as well.

answered Aug 28 '17 at 07:16

Sudhir Jonathan

16,998
13
66
90

The prebuilt pool also has the advantage of immediately telling you what you resource utilizations are likely to be, and holding them constant. So you're less likely to be surprised by load spikes. – Sudhir Jonathan Aug 28 '17 at 07:19

Is there a better way limit requests at the "door"?

2 Answers2

Rate limiting inbound connections

Rate limiting outbound connections to a backend service

A comment on performance