Here is a rate limiter implementation based on @tonywl (and somewhat relates to Duarte Meneses's leaky bucket). The idea is the same - use a "token pool" to allow both rate limiting and bursting (make multiple calls in a short time after idling for a bit).
This implementation offers two main differences:
- Lock-less concurrent access using atomic operations.
- Instead of blocking a request, calculate a delay needed to enforce the rate limit and offers that as the response, allow the caller to enforce the delay - this will work better with asynchronous programming that you can find in modern networking frameworks.
The full implementation with documentation can be found in this Github Gist, which is where I'll also post updates, but here's the gist of it:
import java.util.concurrent.atomic.AtomicLong;
public class RateLimiter {
private final static long TOKEN_SIZE = 1_000_000 /* tockins per token */;
private final double tokenRate; // measured in tokens per ms
private final double tockinRate; // measured in tockins per ms
private final long tockinsLimit;
private AtomicLong available;
private AtomicLong lastTimeStamp;
/**
* Create a new rate limiter with the token fill rate specified as
* {@code fill}/{@code interval} and a maximum token pool size of {@code limit}, starting
* with a {@code prefill} amount of tokens ready to be used.
* @param prefill instead of starting with an empty pool, assume we "start from rest" and
* have tokens to consume. This value is clamped to {@code limit}.
* @param limit The maximum number of tokens in the pool (burst size)
* @param fill How many tokens will be filled in the pool by waiting {@code interval} time
* @param interval How long will it take to get {@code fill} tokens back in the pool in ms
*/
public RateLimiter(int prefill, int limit, int fill, long interval) {
this.tokenRate = (double)fill / interval;
this.tockinsLimit = TOKEN_SIZE * limit;
this.tockinRate = tokenRate * TOKEN_SIZE;
this.lastTimeStamp = new AtomicLong(System.nanoTime());
this.available = new AtomicLong(Math.max(prefill, limit) * TOKEN_SIZE);
}
public boolean allowRequest() {
return whenNextAllowed(1, false) == 0;
}
public boolean allowRequest(int cost) {
return whenNextAllowed(cost, false) == 0;
}
public long whenNextAllowed(boolean alwaysConsume) {
return whenNextAllowed(1, alwaysConsume);
}
/**
* Check when will the next call be allowed, according to the specified rate.
* The value returned is in milliseconds. If the result is 0 - or if {@code alwaysConsume} was
* specified then the RateLimiter has recorded that the call has been allowed.
* @param cost How costly is the requested action. The base rate is 1 token per request,
* but the client can declare a more costly action that consumes more tokens.
* @param alwaysConsume if set to {@code true} this method assumes that the caller will delay
* the action that is rate limited but will perform it without checking again - so it will
* consume the specified number of tokens as if the action has gone through. This means that
* the pool can get into a deficit, which will further delay additional actions.
* @return how many milliseconds before this request should be let through.
*/
public long whenNextAllowed(int cost, boolean alwaysConsume) {
var now = System.nanoTime();
var last = lastTimeStamp.getAndSet(now);
// calculate how many tockins we got since last call
// if the previous call was less than a microsecond ago, we still accumulate at least
// one tockin, which is probably more than we should, but this is too small to matter - right?
var add = (long)Math.ceil(tokenRate * (now - last));
var nowAvailable = available.addAndGet(add);
while (nowAvailable > tockinsLimit) {
available.compareAndSet(nowAvailable, tockinsLimit);
nowAvailable = available.get();
}
// answer the question
var toWait = (long)Math.ceil(Math.max(0, (TOKEN_SIZE - nowAvailable) / tockinRate));
if (alwaysConsume || toWait == 0) // the caller will let the request go through, so consume a token now
available.addAndGet(-TOKEN_SIZE);
return toWait;
}
}