PHP String Evaluation Method

Question

Hoping one of the great minds can help me here.

I have a situation where I will receive an international phone number from a provider and I have to do a database lookup and figure out the country, region, and phone type that the phone number is calling.

For example, say I will receive a phone number +393234567891. I have to look up in a table the country that phone number belongs to. so I know that '39' is Italy, and I have that as a country code in the database but I have to determine whether the number is a landline or a cell phone. For that I need more information out of the phone number so '39' is a landline and '393' is a cell phone. I need to see that the phone number contains '393' and therefore I know that it is a cell phone.

My question is what is the best way to evaluate this? Would it be to loop through each segment of the phone number like first compare the first two phone numbers against the database, then the first three, then the first four until I come back with one single result? For example, if I continue with this example and compare '39' for Italy against the db I will come back with a bunch of results because there are '39' and '393' and '3939' and so on. So what is the best way to use the entire phone number to get the exact match for the phone number prefix?

I would have thought to just loop through the phone number and add a digit of the phone number to the loop until I come back with only one result, I just want to make sure that this is the most efficient way to accomplish this.

Any recommendations? Thanks!

You are asking about comparing the data, but not about getting the country and area code data, correct? — Pekka, Apr 06 '11 at 09:51

vbence · Accepted Answer · 2011-04-14T12:08:51.453

2

I assume you have a table like:

prefix (id, number)

with data like:

1, '39'
2, '393'
3, '33'
4, '331'

You can get the longest match with a reverse LIKE:

SELECT id
FROM prefix
WHERE "393234567891" LIKE CONCAT(number, "%")
ORDER BY LENGTH(number)
LIMIT 1;

I have not tested it, but assuming your shortest prefix is 2 characters, you may get some improvement with (this will only check prefixes starting with 39, that is 1% of all the prefixes you have):

SELECT id
FROM prefix
WHERE "393234567891" LIKE CONCAT(number, "%")
    AND number LIKE "39%"
ORDER BY LENGTH(number)
LIMIT 1;

Then you can have a different table with the informaton attached to that prefix like:

prefixinfo (id, prefix_id, type, data)

with data like:

1, 1, 'country', 'Italy'
2, 2, 'country', 'Italy'
3, 2, 'type',    'Landline'
4, 3, 'country', 'France'
5, 4, 'country', 'France'
6, 4, 'city',    'Paris'

edited Apr 14 '11 at 12:08

answered Apr 06 '11 at 10:05

vbence

20,084
9
69
118

Yes - this is how I solved the problem - NB there is a BUG in the code above - 'ORDER BY number.length' should be 'ORDER BY LENGTH(number) DESC' and you can add the type to the prefix table (number, country, type) and fetch the result in a single operation. – symcbean Apr 06 '11 at 12:31
@symcbean Thnaks, corrected! But I can not add the information chunks to the prefix because there is a one-to-many relationship between them. As the query will only return the ID of `393` (as this is the most specific), all the info has to be available starting from this record. - As the `LIKE` is the most costy operation, I rather not get ALL the prefixes wich apply, but only the most specific. – vbence Apr 06 '11 at 14:31
I just don't get why you separated 'country'and 'type' into 2 rows for prefix_id=2. Otherwise I like this solution. – Slava Apr 06 '11 at 14:58
@Slava I modelled the problem as chunks of information which can freely attached to a prefix. As you can see the in Paris example we know the location of that phone number. In case of mobile phone we do not know the location, just that it's a mobile phone. And for example with *satellite phones* we do not even know the country. I could create a table with many-many fields most of which would be empty in most cases, but I don't like that kind of redundancy. Besides, in this model you can freely add new types of information, without changing the table structure. – vbence Apr 06 '11 at 15:08
I see. Well, you could also combine all that info in one field (xml/serialize/whatsoever) and have even less columns (3 versus yours 4) but ok, it's just another approach :). – Slava Apr 06 '11 at 15:36

score 0 · Answer 2 · answered Apr 06 '11 at 09:48

0

if an actual phone number is fixed size you can remove it. with sub_strreplace; e.g. numbers are 8 digits long:

$code = substr_replace($number,'',-1,8);

$code now will contain only code part. so you can easily count digits and find out what you need.

answered Apr 06 '11 at 09:48

Headshota

21,021
11
61
82

This won't help, because he has a number of prefixes with different lengths (like "39" and "393" and "3939"). He needs to compare the number against the *longest matching prefix* – Pekka Apr 06 '11 at 09:56

score 0 · Answer 3 · answered Apr 06 '11 at 10:11

The last 3 that signifies a mobile in 393, is that the same for every country?

The ideal situation would be to have a table for countries and then another table with related prefixes

Countries table                        Subsearch Table

countryMatch: 39                       substrMatch: 3 // for 393              
countryName: "Italy"                   substrCountry: 39
                                       substrMeaning: "cell"
                                       ...................
                                       substrMatch: 5 // 395
                                       substrCountry: 39
                                       substrMeaning: "something else"

That way once you have determined the country you can limit the rest of your searches for further limiting e.g. 393, 3939.

I reckon your proposed method is sound, looping through bit by bit till you find a match using SQL queries. So find the country code by popping off the first two digits (39), and if found query the subsearch table for results. With those results loop through appending them to the country code and see if you get a match

$subsearchArr = array("3" => "cell","5" => "something else") # from the database 
$match = false;
$country = 39;

foreach($subsearchArr as $key => $value)
{
  # append $key to $country e.g. 393, 395
  # if this is a match to the string
  # set match to true and do your logic
}

if($match == false) # no match so landline
{
  # logic here if landline
}

I reckon that would work, but I guess i'd have to see the exact data structure to be sure. But yeah two tables is definitely desirable

score -1 · Answer 4 · answered Apr 06 '11 at 10:37

It may be better to do the comparisons in PHP with simple array loops, even if you have the data in Mysql. Build from the database (and cache) a PHP array of expected country codes and known unique prefixes within each country to differentiate between mobile, landline, areas etc.

For each country code you have, see if the input phone number starts with that code. Once you find the country, remove the country code from the phone number and test the remaining number against a list of known mobile number prefixes in that country. If found, it's mobile. If not found, it's landline.

For instance, in Greece the country code is 30 and all mobiles start with 69 after that. If, however, you're realing with countries where mobile number prefixes are indistinguishable from area codes (such as USA and Canada) you are out of luck.

function checkMSISDN($msisdn) {

    $countries = array(
        'gr' => array(
            'countryPrefix' => '30',
            'mobilePrefix' => '3069',
            'length' => 12,
        ),
        'it' => array(
            'countryPrefix' => '39',
            'mobilePrefix' => '393',
            'length' => 12,
        ),
    ) ;

    foreach ($countries as $countryName => $countryRules)  {

        $msisdnCurrent = $msisdn ;

        $countryPrefix = $countryRules['countryPrefix'] ;
        $fullPrefix = $countryRules['mobilePrefix'] ;

        //remove leading + if any
        if (substr($msisdnCurrent, 0, 1) == '+') {
            $msisdnCurrent = substr($msisdnCurrent, 1) ;
        }

        //remove leading 00 if any
        if (substr($msisdnCurrent, 0, 2) == '00') {
            $msisdnCurrent = substr($msisdnCurrent, 2) ;
        }

        $msisdnLength = strlen($msisdnCurrent) ;
        if ($msisdnLength != $countryRules['length']) {
            //sanity check, not this country
            continue ;
        }

        if (substr($msisdnCurrent, 0, strlen($countryPrefix)) != $countryPrefix) {
            //not this country
            continue ;
        }

        if (substr($msisdnCurrent, 0, strlen($fullPrefix)) != $fullPrefix) {
            //not mobile
            return "isLandline" ;
        }
        else {
            //mobile
            return "isMobile" ;
        }
    }
    return false ;
}

You should rather let database do it. If scripts were that good working through large sets of data databases whouldn't have been implementing so complex query language. — Slava, Apr 06 '11 at 14:55
@Slava I'm not disagreeing in general. In this case, however, the rules for determining what country an MSISDN is in and if it's a mobile number is a small data set whereas your input will be large (multitude of MSISDNs to check). Store the rules in the database for better management but cache them into PHP and do the processing in memory. Then it's just simple substring comparison, not even substring search. PHP is great at that. Hitting the database with a substring search will get heavy quickly. — Fanis Hatzidakis, Apr 06 '11 at 17:46

PHP String Evaluation Method

4 Answers4