1

I have some strings from our accounting system I need to process. The accounting system only gives the option to put in the postal code and city in one input field. The data is later exported through xml and imported in a php system.

I'm looking for a way to extract the postal code from the city, however these come in various formats so a simple substr(); is not working

Some examples of the values I need to process are:

1234 ZC ALPHEN AAN DEN RIJN
1234SG UTRECHT
33602 BIELEFELD
W7 3QB LONDON

How do I split the postal code from city for each of these? I already contacted the manufacturer of the accounting system, and they understood my problem and will look into splitting the values in 2 for future calls, but that will take some time.

JoshDM
  • 4,939
  • 7
  • 43
  • 72
Snuur
  • 311
  • 1
  • 3
  • 11
  • what country are you getting postcodes for? UK/US ? – Samuel Cook Jul 01 '13 at 14:33
  • Is the first postal code "1234 ZC" or "1234"? The reason I ask is because it's hard to tell the computer what to do when it's even hard for humans to decipher. – Expedito Jul 01 '13 at 14:37
  • i agree, it WILL last a long time. i suspect there is 0 tolerance on converting errors here, so the only way is by hand. i am guessing spaces between postal codes, dashes, letters and numbers mixed together, in every possible combination. oh boy. – Sharky Jul 01 '13 at 14:37
  • well its for all types of countries, at the moment 99% in europe. First postal code is 1234 ZC, some collegues put a space in between the numbers and letters of a dutch postal code, others dont. There is tolerance for error, I put these values in input fields with a submit button, so they can be edited when errors are spotted. When submitted an xml file is uploaded to our shipper with shipping data. They want postal code and city seperate in there. – Snuur Jul 01 '13 at 14:42

2 Answers2

2

It's not in keeping with Google's Terms and Conditions unless you're storing this data to be displayed on a Google map, but it is awfully tempting to harness their power because they're just so good at this stuff.

The Geocoding API will be able to handle pretty much any address/postcode combination and variation you can throw at it - with or without spaces, postcode first or last, etc. etc., including different place names ("London", "Londres").

A request to

http://maps.googleapis.com/maps/api/geocode/json?address=2408%20ZC%20ALPHEN%20AAN%20DEN%20RIJN&sensor=false

returns a JSON stream containing, among other things:

 "address_components" : [
        {
           "long_name" : "2408 ZB",
           "short_name" : "2408 ZB",
           "types" : [ "postal_code" ]
        },
        {
           "long_name" : "Alphen aan den Rijn",
           "short_name" : "Alphen aan den Rijn",
           "types" : [ "locality", "political" ]
        },
 ...

This page outlines the requirements and limitations for using the service.

Note that the Google API will guess stuff if the data is slightly wrong. Your initial example of 1234 ZC isn't correct and the API will interpolate in an attempt to give you something you work with. Make sure you explore how the API reacts to incorrect data, and be careful not to shoot yourself in the foot with the results.

Pekka
  • 442,112
  • 142
  • 972
  • 1,088
  • that's nice but if there is 0 tolerance on errors, the result of this conversion must be reviewed and verified by human. – Sharky Jul 01 '13 at 14:40
  • @Sharky sure. From experience, it's *very* reliable as long as the incoming data is good - at least in western Europe and the US, I guess. It can be slightly off if the data contains even a small mistake, the API will then start interpolating, sometimes with strange results. – Pekka Jul 01 '13 at 14:42
  • yup, it is very reliable, but the incoming data will NOT be "good". i have worked a lot with user data input. it's madness. users can't enter their name correctly 2 times in a row. combo boxes wherever they are applicable are a life saver. – Sharky Jul 01 '13 at 14:48
  • Thank you, I will try both this and the other answer and see what works best for now. Result is eventually also checked by the user that ships the freight. – Snuur Jul 01 '13 at 15:08
0

If you know the country at the time you are attempting to split the postal code off from the city you could use that to look up a regular expression (or similar piece of data) that corresponds to the correct way to parse out the postal code.

For example, you might map countries to regexes in an array (these regular expressions are just samples -- not vigorously tested):

$regexMap = array(
    'US' => '(\d{5}|\d{5}-\d{4}|\d{9})\s+(.*)',
    'UK' => '([\d\w]{2,4}\s+\d\w{2})\s+(.*)',
    ...
);

$regularExpression = $regexMap[$country];
preg_match($regularExpression, $incomingPostalCodeAndCity, $postalData);
$postalCode = $postalData[0];
$city = $postalData[1];

While you probably can combine regular expressions for some (many?) countries, postal codes vary enough that you'll probably still need a fairly lengthy list of regexes.

Each regex should be designed to return the postal code as the first subpattern and the city as the second subpattern.

There is some related information in the answers to this question: What is the ultimate postal code and zip regex? (including some lists of postal code regular expressions for various countries).

Community
  • 1
  • 1
CBerube
  • 111
  • 3