3

I have a list of CIDR like this:

192.168.0.1/24
10.0.0.1/32
etc...

The list is growing.
In order to check if an IP fits within one of these CIDR, I perform a loop with following function:

function cidr_match($ip, $range){
    list ($subnet, $bits) = explode('/', $range);
    $ip = ip2long($ip);
    $subnet = ip2long($subnet);
    $mask = -1 << (32 - $bits);
    $subnet &= $mask; // in case the supplied subnet was not correctly aligned
    return ($ip & $mask) == $subnet;
}

Since my CIDR list is growing, I would like to improve that function in order to avoid testing each line of CIDR one by one until it returns true. I would like to get rid of the for loop surrounding my function above.
Is there a way to perform kind of a "pre-check" on the IP I am about to check, so that it does not run the full list sequentially (from top to bottom)?
I want to optimize so that my code would behave this way: give the IP to the function --> the function kinds of "sorts" the list or "finds" the most probable CIDR --> run the check on the IP for the most probable CIDR(s) --> return "true" as soon as possible
A guidance will be appreciated.

Musa
  • 1,334
  • 1
  • 11
  • 21

2 Answers2

2

Honestly, unless your CIDR range is humongous and you're checking lots of IPs within the same process, you're probably not going to see much in the way of performance gains. However, if that IS the scenario you're looking at, then you could look into trying to squeeze a bit of performance by pre-processing your ranges and IP (perform the ip2long() calls once and store the separated mask/subnet for comparison).

For example, this is the way you're doing it today, I'm assuming:

<?php
// Original style
$ranges = array(
  "192.168.0.1/32",
  "192.168.0.1/26",
  "192.168.0.1/24",
  "192.168.0.1/16",
  "127.0.0.1/24",
  "10.0.0.1/32",
  "10.0.0.1/24"
);


// Run the check
$start = microtime(true);
find_cidr("10.0.0.42", $ranges);
find_cidr("192.168.0.12", $ranges);
find_cidr("10.0.0.1", $ranges);
$end = microtime(true);
echo "Ran 3 find routines in " . ($end - $start) . " seconds!\n";

function find_cidr($ip, $ranges)
{
  foreach($ranges as $range)
  {
    if(cidr_match($ip, $range))
    {
      echo "IP {$ip} found in range {$range}!\n";
      break;
    }
  }  
}

function cidr_match($ip, $range){
    list ($subnet, $bits) = explode('/', $range);
    $ip = ip2long($ip);
    $subnet = ip2long($subnet);
    $mask = -1 << (32 - $bits);
    $subnet &= $mask; // in case the supplied subnet was not correctly aligned
    return ($ip & $mask) == $subnet;
}

On my machine, that runs in about 0.0005 - 0.001 seconds (checking 3 IPs against a small handful of ranges).

If I write something to pre-process the ranges:

<?php
// Slightly-optimized style

$ranges = array(
  "192.168.0.1/32",
  "192.168.0.1/26",
  "192.168.0.1/24",
  "192.168.0.1/16",
  "127.0.0.1/24",
  "10.0.0.1/32",
  "10.0.0.1/24"
);

$matcher = new BulkCIDRMatch($ranges);
$start = microtime(true);
$matcher->FindCIDR("10.0.0.42");
$matcher->FindCIDR("192.168.0.12");
$matcher->FindCIDR("10.0.0.1");
$end = microtime(true);
echo "Ran 3 find routines in " . ($end - $start) . " seconds!\n";


class BulkCIDRMatch
{
  private $_preparedRanges = array();

  public function __construct($ranges)
  {
    foreach($ranges as $range)
    {
      list ($subnet, $bits) = explode('/', $range);
      $subnet = ip2long($subnet);
      $mask = -1 << (32 - $bits);
      $subnet &= $mask; // in case the supplied subnet was not correctly aligned

      $this->_preparedRanges[$range] = array($mask,$subnet);
    }
  }

  public function FindCIDR($ip)
  {
    $result = $this->_FindCIDR(ip2long($ip));
    if($result !== null)
    {
      echo "IP {$ip} found in range {$result}!\n";
    }
    return $result;
  }

  private function _FindCIDR($iplong)
  {
    foreach($this->_preparedRanges as $range => $details)
    {
      if(($iplong & $details[0]) == $details[1])
      {
        return $range;
      }
    }

    // No match
    return null;
  }
}

...then I see faster CHECKING but there's slightly more overhead at the beginning when the class is initialized and it processes and stores all the ranges. So if I timed the OVERALL run with just 3 IPs against a handful of ranges, the "optimized" way is actually a little slower. But if I run 1,000 IPs against 10,000 CIDRs, the "optimized" way will have more noticeable improvements over the original way (at the expense of additional memory usage to store the pre-processed range data).

So it's all really up to volume and what you're trying to do.

That said, if you're worried about 0.001 second performance being too slow, then PHP might not be the right language to use for your checks. Or at least you might want to consider writing a custom extension so more of the processing is done in C.

EDIT: To answer the original question about finding "probable" ranges to check (prior to any sort of conversion from its string form), it's probably not a very reliable thing to try. Ranges can span past their initial octets, so if you start comparing those values (e.g. "I'm looking at 192.168.1.0, so I'm only going to look at ranges starting in 192"), you're not only incurring the performance overhead of the string comparison on each entry (which slows down the overall lookup), but you could potentially miss out on a valid range.

jhilgeman
  • 1,543
  • 10
  • 27
  • You then propose to sequentially read a list of "ip2longs" instead of a list of "CIDRs", which is about the same issue. The keyword is "probable" indeed. I appreciate your help (and the benchmark!) – Musa Jan 18 '18 at 00:57
  • You could potentially separate the ranges by their starting long into separate arrays (e.g. 0-10000000, 10000001-20000000, etc) and do a first pass through the shorter list of arrays and then just loop through any ranges inside the general vicinity. But again, at these speeds, the volume has to be enormous to make much difference and you'd have to tweak it to find the right trade-offs. – jhilgeman Jan 18 '18 at 02:10
2

If you're actually concerned about performance then you should be storing the list in something resembling a structure, and search it in a way that doesn't mean looking at every entry until you find a match.

In this case it's a sorted list and a binary search:

class CidrList {

    protected $ranges = [];

    public function addRanges($ranges) {
        foreach($ranges as $range) {
            $this->addRange($range);
        }
        $this->sortRanges();
    }

    public function findRangeByIP($ip) {
        return $this->_findRangeByIP(ip2long($ip));
    }

    // simple binary search
    protected function _findRangeByIP($ip, $start=NULL, $end=NULL) {
        if( $end < $start || $start > $end ) { return false; }

        if( is_null($start) ) { $start = 0; }
        if( is_null($end)   ) { $end = count($this->ranges) -1; }

        $mid = (int)floor(($end + $start) / 2);
        switch( $this->inRange($ip, $this->ranges[$mid]) ) {
            case 0:
                return $this->ranges[$mid][2];
            case -1:
                return $this->_findRangeByIP($ip, $start, $mid-1);
            case 1:
                return $this->_findRangeByIP($ip, $mid+1, $end);
        }
    }

    // add a single range, protected as the list must be sorted afterwards.
    protected function addRange($range) {
        list ($subnet, $bits) = explode('/', $range);
        $subnet = ip2long($subnet);
        $mask = -1 << (32 - $bits);
        $min = $subnet & $mask;
        $max = $subnet | ~$mask;
        $this->ranges[] = [$min, $max, $range];
    }

    // sort by start, then by end. aka from narrowest overlapping range to widest
    protected function sortRanges() {
        usort($this->ranges, function($a, $b) {
            $res = $a[0] - $b[0];
            switch($res) {
                case 0:
                    return $a[1] - $b[1];
                default:
                    return $res;
            }
        });
    }

    protected function inRange($ip, $range) {
        list($start, $end, $cidr) = $range;
        if( $ip < $start ) { return -1; }
        if( $ip > $end ) { return 1; }
        return 0;
    }
}

Usage:

$l = new CidrList();
$l->addRanges(["192.168.0.1/16", "192.168.0.1/24", "127.0.0.1/24", "10.0.0.1/24"]);

var_dump(
    $l->findRangeByIP('192.168.0.10'),
    $l->findRangeByIP('192.168.1.10'),
    $l->findRangeByIP('1.2.3.4')
);

Output:

string(14) "192.168.0.1/24"
string(14) "192.168.0.1/16"
bool(false)

Additionally, you should avoid continually re-processing the strings by either caching the entire CidrList object, or its internal set of ranges.

Sammitch
  • 30,782
  • 7
  • 50
  • 77
  • That seems way closer to what I wanted to achieved. Thank you very much! Would you recommend a way to cache the object? I'm thinking about the good old file cache way... (https://blog.graphiq.com/500x-faster-caching-than-redis-memcache-apc-in-php-hhvm-dcd26e8447ad) – Musa Jan 18 '18 at 04:53