16

I use a variety of 3rd party web APIs, and many of them enforce rate limiting. It would be very useful to have a fairly generic PHP library that I could rate limit my calls with. I can think of a few ways to do it, perhaps by putting calls into a queue with a timestamp of when the call can be made, but I was hoping to avoid reinventing the wheel if someone else has already done this well.

Aaron F Stanton
  • 161
  • 1
  • 1
  • 3
  • I can't take credit, but I made use of this approach as there was no 'generic' package - but I guess you could make it so depending on your coding approach. https://stackoverflow.com/questions/1375501/how-do-i-throttle-my-sites-api-users – Brian Nov 23 '10 at 15:36

7 Answers7

11

You can do rate limiting with the token bucket algorithm. I implemented that for you in PHP: bandwidth-throttle/token-bucket :

use bandwidthThrottle\tokenBucket\Rate;
use bandwidthThrottle\tokenBucket\TokenBucket;
use bandwidthThrottle\tokenBucket\storage\FileStorage;

$storage = new FileStorage(__DIR__ . "/api.bucket");
$rate    = new Rate(10, Rate::SECOND);
$bucket  = new TokenBucket(10, $rate, $storage);
$bucket->bootstrap(10);

if (!$bucket->consume(1, $seconds)) {
    http_response_code(429);
    header(sprintf("Retry-After: %d", floor($seconds)));
    exit();
}
Markus Malkusch
  • 7,738
  • 2
  • 38
  • 67
8

I realize this is an old thread but thought I'd post my solution since it was based on something else I found on SE. I looked for a while for an answer myself but had trouble finding something good. It's based on the Python solution discussed here, but I've added support for variable-sized requests and turned it into a function generator using PHP closures.

function ratelimiter($rate = 5, $per = 8) {
  $last_check = microtime(True);
  $allowance = $rate;

  return function ($consumed = 1) use (
    &$last_check,
    &$allowance,
    $rate,
    $per
  ) {
    $current = microtime(True);
    $time_passed = $current - $last_check;
    $last_check = $current;

    $allowance += $time_passed * ($rate / $per);
    if ($allowance > $rate)
      $allowance = $rate;

    if ($allowance < $consumed) {
      $duration = ($consumed - $allowance) * ($per / $rate);
      $last_check += $duration;
      usleep($duration * 1000000);
      $allowance = 0;
    }
    else
      $allowance -= $consumed;

    return;
  };
}

It can be used to limit just about anything. Here's a stupid example that limits a simple statement at the default five "requests" per eight seconds:

$ratelimit = ratelimiter();
while (True) {
  $ratelimit();
  echo "foo".PHP_EOL;
}

Here's how I'm using it to limit batched requests against the Facebook Graph API at 600 requests per 600 seconds based on the size of the batch:

$ratelimit = ratelimiter(600, 600);
while (..) {
  ..

  $ratelimit(count($requests));
  $response = (new FacebookRequest(
    $session, 'POST', '/', ['batch' => json_encode($requests)]
  ))->execute();

  foreach ($response->..) {
    ..
  }
}

Hope this helps someone!

Community
  • 1
  • 1
mwp
  • 8,217
  • 20
  • 26
  • Just what I wanted. Thanks heaps! – eozzy Aug 27 '16 at 05:20
  • 2
    I wouldn't use usleep()/sleep(), because it negates the whole purpose of rate limiting! Instead display header with HTTP code 429 for exceeded rate limits. Sleeping the application still uses CPU cycles and costs resources in the long run (especially for many App/API calls). – GTodorov Feb 28 '22 at 23:28
  • @GTodorov I completely disavow this code I wrote seven years ago. :p However, I think you've misunderstood the purpose of this function. It's a client-side rate limiter, to use when you are calling out to a remote service and need to wait a bit between requests. – mwp Mar 01 '22 at 22:15
4

This is essentially the same as @Jeff's answer, but I have tidied the code up a lot and added PHP7.4 type/return hinting.

I have also published this as a composer package: https://github.com/MacroMan/rate-limiter

composer require macroman/rate-limiter

/**
 * Class RateLimiter
 *
 * @package App\Components
 */
class Limiter
{
    /**
     * Limit to this many requests
     *
     * @var int
     */
    private int $frequency = 0;

    /**
     * Limit for this duration
     *
     * @var int
     */
    private int $duration = 0;

    /**
     * Current instances
     *
     * @var array
     */
    private array $instances = [];

    /**
     * RateLimiter constructor.
     *
     * @param int $frequency
     * @param int $duration #
     */
    public function __construct(int $frequency, int $duration)
    {
        $this->frequency = $frequency;
        $this->duration = $duration;
    }

    /**
     * Sleep if the bucket is full
     */
    public function await(): void
    {
        $this->purge();
        $this->instances[] = microtime(true);

        if (!$this->is_free()) {
            $wait_duration = $this->duration_until_free();
            usleep($wait_duration);
        }
    }

    /**
     * Remove expired instances
     */
    private function purge(): void
    {
        $cutoff = microtime(true) - $this->duration;

        $this->instances = array_filter($this->instances, function ($a) use ($cutoff) {
            return $a >= $cutoff;
        });
    }

    /**
     * Can we run now?
     *
     * @return bool
     */
    private function is_free(): bool
    {
        return count($this->instances) < $this->frequency;
    }

    /**
     * Get the number of microseconds until we can run the next instance
     *
     * @return float
     */
    private function duration_until_free(): float
    {
        $oldest = $this->instances[0];
        $free_at = $oldest + $this->duration * 1000000;
        $now = microtime(true);

        return ($free_at < $now) ? 0 : $free_at - $now;
    }
}

Usage is the same

use RateLimiter\Limiter;

// Limit to 6 iterations per second
$limiter = new Limiter(6, 1);

for ($i = 0; $i < 50; $i++) {
    $limiter->await();

    echo "Iteration $i" . PHP_EOL;
}
MacroMan
  • 2,335
  • 1
  • 27
  • 36
  • Hi, I kept getting an `Undefined offset: 0` Exception on the first line of `duration_until_free()`. I fixed this by re-indexing the array using `$this->instances = array_values($this->instances);` after filtering the array in the `purge()` function. apparently `array_filter()` creates gaps in the array and some indexes may be null/undefined. – Edgar Nov 22 '21 at 15:36
1

As an alternate, I've (in the past) created a "cache" folder that stored the API calls so if I try to make the same call again, within a specific time range, it grabs from the cache first (more seamless) until it's okay to make a new call. May end up with archived information in the short term, but saves you from the API blocking you in the long term.

Brad Christie
  • 100,477
  • 16
  • 156
  • 200
  • 1
    Caching is only useful if I'm calling a given API with the same parameters. That's a step in the right direction, but I will often be varying parameters and expecting different results. Also, some APIs forbid caching in their TOS. – Aaron F Stanton Nov 25 '10 at 04:15
1

I liked mwp's answer and I wanted to convert it to OO to make me feel warm and fuzzy. I ended up drastically rewriting it to the point that it is totally unrecognizable from his version. So, here is my mwp-inspired OO version.

Basic explanation: Every time await is called, it saves the current timestamp in an array and throws out all old timestamps that arent relevant anymore (greater than the duration of the interval). If the rate limit is exceeded, then it calculates the time until it will be freed up again and sleeps until then.

Usage:

$limiter = new RateLimiter(4, 1); // can be called 4 times per 1 second
for($i = 0; $i < 10; $i++) {
    $limiter->await();
    echo microtime(true) . "\n";
}

I also added a little syntactic sugar for a run method.

$limiter = new RateLimiter(4, 1);
for($i = 0; $i < 10; $i++) {
    $limiter->run(function() { echo microtime(true) . "\n"; });
}
<?php

class RateLimiter {
    private $frequency;
    private $duration;
    private $instances;
 
    public function __construct($frequency, $duration) {
        $this->frequency = $frequency;
        $this->duration = $duration;
        $this->instances = [];
    }

    public function await() {

        $this->purge();
        $this->instances[] = microtime(true);

        if($this->is_free()) {
            return;
        }
        else {
            $wait_duration = $this->duration_until_free();
            usleep(floor($wait_duration));
            return;
        }
    }

    public function run($callback) {
        if(!is_callable($callback)) {
            return false;
        }

        $this->await();
        $callback();

        return true;
    }
    
    public function purge() {
        $this->instances = RateLimiter::purge_old_instances($this->instances, $this->duration);
    }
    
    public function duration_until_free() {
        return RateLimiter::get_duration_until_free($this->instances, $this->duration);
    }

    public function is_free() {
        return count($this->instances) < $this->frequency;
    }

    public static function get_duration_until_free($instances, $duration) {
        $oldest = $instances[0];
        $free_at = $oldest + $duration * 1000000;
        $now = microtime(true);

        if($free_at < $now) {
            return 0;
        }
        else {
            return $free_at - $now;
        }
    }

    public static function purge_old_instances($instances, $duration) {
        $now = microtime(true);
        $cutoff = $now - $duration;
        return array_filter($instances, function($a) use ($duration, $cutoff) {
            return $a >= $cutoff;
        });
    }
}
Jeff
  • 13,943
  • 11
  • 55
  • 103
  • Replace ```usleep($wait_duration);``` with ```usleep(floor($wait_duration));``` to remove PHP 8.1 depreciation warning about floating typing precision. – WiiLF Jan 30 '23 at 21:09
  • Nice callout @wiilf. Thanks. – Jeff Feb 01 '23 at 18:33
0

PHP source code to limit access to your API by allowing a request every 5 seconds for any user and using Redix.

Installing the Redis/Redix client :

composer require predis/predis

Download Redix (https://github.com/alash3al/redix/releases) depending on your operating system, then start the service :

./redix_linux_amd64

The following answer indicates that Redix is listening on ports 6380 for RESP protocol and 7090 for HTTP protocol.

redix resp server available at : localhost:6380
redix http server available at : localhost:7090

In your API, add the following code to the header :

<?php
 require_once 'class.ratelimit.redix.php';

 $rl = new RateLimit();
 $waitfor = $rl->getSleepTime($_SERVER['REMOTE_ADDR']);
 if ($waitfor>0) {
   echo 'Rate limit exceeded, please try again in '.$waitfor.'s';
   exit;    
 }

 // Your API response
 echo 'API response';

The source code for the script class.ratelimit.redix.php is :

<?php
require_once __DIR__.'/vendor/autoload.php';
Predis\Autoloader::register();

class RateLimit {

  private $redis;
  const RATE_LIMIT_SECS = 5; // allow 1 request every x seconds

  public function __construct() {
     $this->redis = new Predis\Client([
         'scheme' => 'tcp',
         'host'   => 'localhost', // or the server IP on which Redix is running
         'port'   => 6380
     ]);
  }

 /**
  * Returns the number of seconds to wait until the next time the IP is allowed
  * @param ip {String}
  */
 public function getSleepTime($ip) {
     $value = $this->redis->get($ip);
     if(empty($value)) {
       // if the key doesn't exists, we insert it with the current datetime, and an expiration in seconds
         $this->redis->set($ip, time(), self::RATE_LIMIT_SECS*1000);
         return 0;
       } 
       return self::RATE_LIMIT_SECS - (time() - intval(strval($value)));
     } // getSleepTime
 } // class RateLimit
Yash
  • 141
  • 1
  • 5
0

Reviewing other answers, decided that one thing is missing here: asynchronous handling.

Most proposed solutions are synchronous, they wait until the expiration of rate limit and force the client to wait with them until the limit expires. Even if the limit expires next midnight ;-)

In short, when your code communicates with a rate limited service, you do not call it directly, but put the call in the queue and proceed execution to do other stuff. Or just quit if this other stuff depends on service response and notify the client about the work is queued.

Eventually the queue handler (which calls external service at the specified rate) will launch the request and persist the response.

This approach guarantees that:

  • no requests are lost, all requests are served at one moment in time,
  • client responses are quick and non-blocking, the client does not have to wait with you until the limit expires,
  • parallel execution is enabled both for your code and for the queue handler.

Necessary precondition:

  • clients should accept that the code is asynchronous, launching request to your code does not produce immediate result from external service, but at least returns quickly.

Another important points to address by queue workers:

  • track the usage of rate limited service and pause execution if limits are reached;

  • check the persisted result before calling external service. It is normal that clients launch several requests with the same arguments if they don't receive immediate results. Even if you notify them that they need to wait a bit before reloading the page . So the probability of finding identical requests in the queue is high enough and we don't want to exhaust our limits with this.

So, our code pushes requests into the queue (RabbitMQ, Redis or your preferred one) and returns the message saying the process started, be back soon.

Now, rabbit takes the message with request description and gives it to one of your workers (yes, there are several ones running in parallel). Worker code needs to:

  • verify if the request was already executed and persisted, in this case acknowledge and drop the message as executed and take a new one;

  • decide whether to call the external service or wait;

  • if we need to wait (non-expired flag is set): requeue the message, sleep for a minute and quit;

  • if we can run: launch external service call and analyse the response;

  • if response says rate limit reached, then set corresponding flag with expiration delay for all the workers (use persistent storage like Redis, DB etc., avoid memcache or shared folder), requeue the message, wait a minute and quit;

  • if response is successful, persist the result, reset the flag if needed.

Stas Trefilov
  • 173
  • 1
  • 6