The legitimate users of my site occasionally hammer the server with API requests that cause undesirable results. I want to institute a limit of no more than say one API call every 5 seconds or n calls per minute (haven't figured out the exact limit yet). I could obviously log every API call in a DB and do the calculation on every request to see if they're over the limit, but all this extra overhead on EVERY request would be defeating the purpose. What are other less resource-intensive methods I could use to institute a limit? I'm using PHP/Apache/Linux, for what it's worth.
-
Is this just a bandage while you tweak the API or add more servers? It's very dangerous to take something away from / put restrictions on developers... – Austin Salonen Sep 03 '09 at 19:51
-
13No, I'm trying to put reasonable limits in place to make the site sustainable. Adding server capacity for a few overzealous users isn't part of the plan. – scotts Sep 03 '09 at 20:10
8 Answers
Ok, there's no way to do what I asked without any writes to the server, but I can at least eliminate logging every single request. One way is by using the "leaky bucket" throttling method, where it only keeps track of the last request ($last_api_request
) and a ratio of the number of requests/limit for the time frame ($minute_throttle
). The leaky bucket never resets its counter (unlike the Twitter API's throttle which resets every hour), but if the bucket becomes full (user reached the limit), they must wait n
seconds for the bucket to empty a little before they can make another request. In other words it's like a rolling limit: if there are previous requests within the time frame, they are slowly leaking out of the bucket; it only restricts you if you fill the bucket.
This code snippet will calculate a new $minute_throttle
value on every request. I specified the minute in $minute_throttle
because you can add throttles for any time period, such as hourly, daily, etc... although more than one will quickly start to make it confusing for the users.
$minute = 60;
$minute_limit = 100; # users are limited to 100 requests/minute
$last_api_request = $this->get_last_api_request(); # get from the DB; in epoch seconds
$last_api_diff = time() - $last_api_request; # in seconds
$minute_throttle = $this->get_throttle_minute(); # get from the DB
if ( is_null( $minute_limit ) ) {
$new_minute_throttle = 0;
} else {
$new_minute_throttle = $minute_throttle - $last_api_diff;
$new_minute_throttle = $new_minute_throttle < 0 ? 0 : $new_minute_throttle;
$new_minute_throttle += $minute / $minute_limit;
$minute_hits_remaining = floor( ( $minute - $new_minute_throttle ) * $minute_limit / $minute );
# can output this value with the request if desired:
$minute_hits_remaining = $minute_hits_remaining >= 0 ? $minute_hits_remaining : 0;
}
if ( $new_minute_throttle > $minute ) {
$wait = ceil( $new_minute_throttle - $minute );
usleep( 250000 );
throw new My_Exception ( 'The one-minute API limit of ' . $minute_limit
. ' requests has been exceeded. Please wait ' . $wait . ' seconds before attempting again.' );
}
# Save the values back to the database.
$this->save_last_api_request( time() );
$this->save_throttle_minute( $new_minute_throttle );

- 6,899
- 7
- 44
- 59

- 4,027
- 3
- 29
- 26
You can control the rate with the token bucket algorithm, which is comparable to the leaky bucket algorithm. Note that you will have to share the state of the bucket (i.e. the amount of tokens) over processes (or whatever scope you want to control). So you might want to think about locking to avoid race conditions.
The good news: I did all of that for you: bandwidth-throttle/token-bucket
use bandwidthThrottle\tokenBucket\Rate;
use bandwidthThrottle\tokenBucket\TokenBucket;
use bandwidthThrottle\tokenBucket\storage\FileStorage;
$storage = new FileStorage(__DIR__ . "/api.bucket");
$rate = new Rate(10, Rate::SECOND);
$bucket = new TokenBucket(10, $rate, $storage);
$bucket->bootstrap(10);
if (!$bucket->consume(1, $seconds)) {
http_response_code(429);
header(sprintf("Retry-After: %d", floor($seconds)));
exit();
}

- 7,738
- 2
- 38
- 67
-
Thanks for the link to the token bucket algorithm - without that I would not have realized that it and leaky bucket were bonafide algorithms. – Colin Jun 26 '17 at 15:37
Simplest solution would be to just give each API key a limited number of requests per 24 hours, and reset them at some known, fixed, time.
If they exhaust their API requests (ie. the counter reaches zero, or the limit, depending on the direction you're counting), stop serving them data until you reset their counter.
This way, it will be in their best interest to not hammer you with requests.

- 380,855
- 102
- 628
- 825
I don't know if this thread is still alive or not but I would suggest to keep these statistics in memory cache like memcached. This will reduce the overhead of logging the request to the DB but still serve the purpose.

- 1,441
- 11
- 17
-
I agree completely and we implement this way as well as its also atomic. You could use something like AWS elasticache to store them and then have a cronjob just log the aggregated results afterward into a database. We actually have a small memcached instance per server to do incrementing and then flush/increment this to elasticache once a minute - that way you don't move the bottleneck to elasticache either. – Ross Apr 22 '13 at 19:08
-
@Kedar you can still log all the calls in a file for different kinds of analysis', which would not bother your DB , just queuing the writes on disk buffer. – kommradHomer Feb 21 '14 at 16:10
-
Would redis be a better solution? It's in ram but also non volatile? – BeardedGeek Apr 19 '15 at 09:29
In addition to implementation from scratch you you can also take a look at API infrastructure like 3scale (http://www.3scale.net) which does rate limiting as well as a bunch of other stuff (analytics etc.). There's a PHP plugin for it: https://github.com/3scale/3scale_ws_api_for_php.
You can also stick something like Varnish infront of the API and do the API rate limiting like that.

- 1,978
- 13
- 23
You say that "all thos extra overhead on EVERY request would be defeating the purpose", but I'm not sure that's correct. Isn't the purpose to prevent hammering of your server? This is probably the way I would implement it, as it really only requires a quick read/write. You could even farm out the API server checks to a different DB/disk if you were worried about the performance.
However, if you want alternatives, you should check out mod_cband, a third-party apache module designed to assist in bandwidth throttling. Despite being primarily for bandwidth limiting, it can throttle based on requests-per-second as well. I've never used it, so I'm not sure what kind of results you'd get. There was another module called mod-throttle as well, but that project appears to be closed now, and was never released for anything above the Apache 1.3 series.

- 92,731
- 24
- 156
- 164
-
Yeah, I'll probably have to save something on disk.. preferably not every single log request though. I could just save the last successful API request and make sure it's n seconds later than that. – scotts Sep 04 '09 at 21:17
Couldn't this be done really simply with a session?
Compare microtime()
to $_SESSION['last_access_microtime']
.

- 21,981
- 30
- 95
- 142

- 389
- 3
- 14
In node js, there is a package name expess-rate-limiter which doing exactly what you are trying to accomplish.
It limits a number of requests in a period of time. I don't know if we have the same thing in PHP.

- 11
- 1
- 4
-
The user specifically asked for answers related to PHP. Unfortunately, in this regard, your comment isn't helping at all. – Sascha Aug 07 '23 at 12:34