1

Is there a best practice or common algorithm for implementing conversion of natural search string for locations location (US only) into its separate components?

for example:

City Name, ST 00000

TO

city => City Name
state => ST
zipcode => 00000

This is for a form so i dont need to handle any possible permutation - i can restrict the format to something like: city, st 00000 but i need to be able to handle omission of the any of the segments within the format, so that they optional to some extent... some examples of supported combinations (case insensitive):

00000 // zipcode
0000-00000 //zipcode
city, st / city and state - comma separated
city st // city and state - space separated
city, st 00000 // city state zip
st 00000 // state and zip - though i only really need the zip
city 00000 // city and zip - though i only really need the zip

I can also use a static set of State abbreviations so those could potentially be matched to validate a state segment if needed.

prodigitalson
  • 60,050
  • 10
  • 100
  • 114
  • 4
    Stand back! I know Regular Expressions! http://xkcd.com/208/ – corsiKa Feb 23 '11 at 03:06
  • 1
    If you want this to be general, I'd send it off to a location API like Google Maps. – ide Feb 23 '11 at 03:13
  • @glowcoder: hahahaha... priceless. Still though it seems like this is a fairly common thing. Right now im not implementing geospatial searching (when i do ill use an api) i just need to split the string (from a single form field) into potential values for a quick search of the db... – prodigitalson Feb 23 '11 at 03:24

2 Answers2

1
<?php
    function uslocation($string)
    {
            // Fill it with states
        $states = array('D.C.', 'D.C', 'DC', 'TX', 'CA', 'ST');

        // Extract state
        $state = '';
        foreach($states as $st)
        {
            $statepos = strpos(' '.$string, $st);
            if($statepos > 0)
            {
                $state = substr($string, $statepos-1, strlen($st));
                $string = substr_replace($string, '', $statepos-1, strlen($st));
            }
        }

        if(preg_match('/([\d\-]+)/', $string, $zipcode))
        {
            $zipcode = $zipcode[1];
            $string = str_replace($zipcode, '', $string);
        }
        else
        {
            $zipcode = '';
        }

        return array(
            'city' => trim(str_replace(',', '', $string)),
            'state' => $state,
            'zipcode' => $zipcode,
        );
    }

    // Some tests
    $check = array(
        'Washington D.C.',
        'City Name TX',
        'City Name, TX',
        'City Name, ST, 0000',
        'NY 7445',
        'TX 23423',
    );

    echo '<pre>';
    foreach($check as $chk)
    {
        echo $chk . ": \n";
        print_r(uslocation($chk));
        echo "\n";
    }
    echo '</pre>';
?>
delphist
  • 4,409
  • 1
  • 22
  • 22
  • This could work... but i think i would rather hit it multiple times so i can handle any combination (unless you have some insane regex to handle that in one shot, ha). This will be a bit slower but nothing unacceptable i think. – prodigitalson Feb 23 '11 at 03:28
  • i could help you if you bring some examples of combinations – delphist Feb 23 '11 at 03:36
  • states are always two-character? (CA, TX, UT) ? – delphist Feb 23 '11 at 03:50
0

While i was researching I found some other code referenced in another SO question that i used while i was waiting... I modified the code here to support getting the zipcode as well as the city state: http://www.eotz.com/2008/07/parsing-location-string-php

Others might also find this useful.

@delphist: THANKS. Once i have time to compare accuracy and performance i may switch to your code if its better - its certainly simpler/shorter! If i do Ill mark it as the official answer.

Community
  • 1
  • 1
prodigitalson
  • 60,050
  • 10
  • 100
  • 114