4

Someone please put me out of my misery and help me solve this one.

I have a postcode lookup box that allows people to put in full postcodes (e.g. BS34 5GF) or part post codes (e.g. BS34).

My postcode lookup only requires the first part of the postcode and I am trying to find the most effective way of trimming the string to only have the first section, without explicitly knowing the format it is entered in.

Here are some example codes: B2 5GG, B22 5GG, BS22 5GG, B25GG, BS25GG, BS225GG, B2, BS2, BS22

This shows how many possible variations there could be.. What is the best way to ensure I always get the first part of the postcode?

David Shaw
  • 565
  • 1
  • 4
  • 12
  • Maybe I'm wrong but I always thought that British post codes always had THREE trailing characters. If this is the case, the answer is simple. – itsols May 03 '13 at 13:02
  • It sounds like you've tried a few things. What have you tried so the community can better make suggestions without suggesting stuff you've already tried? I don't know enough of how UK postcodes are formatted but I could see a good regex here. I imagine there's a lot out there already, but that you would have tried that (which would lead me to ask, why it fails, but goes back to my first point of asking what you've tried). – jmbertucci May 03 '13 at 13:03
  • 1
    what do you mean by first part? means how much part is first part? –  May 03 '13 at 13:09
  • 1
    Thanks for the comments. The problem is the first part could be either 2,3 or 4 characters long and the second part is always 3 characters long. - If the user was always putting in a full postcode then simply removing the last 3 characters would be the simple answer, however that does not work when they are only entering the area code (first part). – David Shaw May 03 '13 at 13:43
  • I have tried removing all spaces and removing last three characters, but this fails when only the first part is entered. Am guessing that I need to work out the length of the string entered and then perform a different action depending on the length of the string? In my head I am thinking that I need to explode a string on a space, then take the first section but this would not work for people who enter their full postcodes without a space. – David Shaw May 03 '13 at 13:45
  • See also: http://stackoverflow.com/questions/164979/uk-postcode-regex-comprehensive (accepted answer includes the official government-supplied regex for matching postcodes) – Spudley May 03 '13 at 14:04
  • @Spudley: the government-official regex does not apply to parts of a postcode. – Sébastien Renauld May 03 '13 at 14:11
  • @SébastienRenauld - no, but it's fairly trivial to remove (or make optional) the trailing three characters from the regex to match the post-town section. 99.9% of the time, when people talk about a part postcode, they really mean the post-town section. – Spudley May 03 '13 at 14:18
  • @Spudley: If you have an entire postcode, it's fairly trivial. If you have part of a postcode of arbitrary length, stuff becomes hairy...especially if you're dealing with foreigners (I've had someone enter OX135 on a postcode box before) – Sébastien Renauld May 03 '13 at 14:20

5 Answers5

4

IMHO regexes are exactly the right solution to the problem.

Ignoring for now BFPO address, try:

if (preg_match("(([A-Z]{1,2}[0-9]{1,2})($|[ 0-9]))", trim($postcode), $match)) {
   $region=$match[1];
}
symcbean
  • 47,736
  • 6
  • 59
  • 94
3

If you use regular expressions to match British postcodes (part or whole), you're doing it wrong. Also, please note before going any further that, no matter how you write your code, there is one case where the format will be ambiguous: BS22 could very well belong to BS2 2AB or BS22 5GS. There is absolutely no way to tell, and you'll need to make a decision based on that.

The algorithm I am suggesting considers the case of BS22 to count as BS22. It is as follows:

<?php
function testPostcode($mypostcode) {
if (($posOfSpace = stripos($mypostcode," ")) !== false) return substr($mypostcode,0,$posOfSpace);
    // Deal with the format BS000
    if (strlen($mypostcode) < 5) return $mypostcode;

    $shortened = substr($mypostcode,0,5);
    if ((string)(int)substr($shortened,4,1) === (string)substr($shortened,4,1)) {
       // BS000. Strip one and return
       return substr($shortened,0,4);
    }
    else {
      if ((string)(int)substr($shortened,3,1) === (string)substr($shortened,3,1)) {
         return substr($shortened,0,3);
      }
      else return substr($shortened,0,2);
    }
}

// Test cases
$postcodes = array("BS3 3PL", "BS28BS","BS34","BS345","BS32EQ");
foreach ($postcodes as $k => $v) {
   echo "<p>".$v." => ".testPostcode($v)."</p>";
}

This is both faster and simpler to maintain than a regular expression.

Sébastien Renauld
  • 19,203
  • 2
  • 46
  • 66
  • A little codepad of it: http://codepad.org/LqcwiOnc . Feel free to add postcodes or fragments to test. – Sébastien Renauld May 03 '13 at 14:06
  • When I tested this on the examples David gave, B25GG didn't work. You seem to have had a similar idea to me, but traversing through the String slightly differently. Maybe there is an error with your indexing? – Jon May 03 '13 at 14:12
  • @Jon: Good point, didn't test with one letter on there. Editing to fix it. – Sébastien Renauld May 03 '13 at 14:14
  • 2
    Well, the UK government provides an official regex for matching a postcode, so it can't be *that* wrong. That said, the pattern is complex, so a non-regex solution would likely be quicker. Your point about the ambiguity is well-made, but a part-postcode would generally be considered to be the first part; if I supplied `BS22` to a postcode field, I would expect it to understand that I meant `BS22 xxx`, not `BS2 2xx` - ie a part postcode is generally understood to indicate the post town, not just a postcode that is missing arbitrary characters from the end. – Spudley May 03 '13 at 14:14
  • @Spudley: The UK government provides a regex to match **complete** postcodes. Not the same thing at all. You will need a completely different and altogether more complex regex to take into account all possible permutations for a part of a postcode. String manipulation in that particular case is always faster. – Sébastien Renauld May 03 '13 at 14:17
  • @Jon: edited. If index 2 and 3 are digits, the script now returns appropriately. – Sébastien Renauld May 03 '13 at 14:17
2

What about if you took out the spaces and did a check on the length. I think all postcodes have to be at least 5 characters long.

If the postcode is less than 5 characters, take the whole thing as the area code. If it is greater than 5 characters, remove the last 3 characters and take the remainder as the area code:

function getPostCodeArea($pcode){
   $pcode = str_replace(' ', '', $pcode);
   if(strlen($pcode) > 4){
      if(is_numeric($pcode{strlen($pcode)-1})){
        $pcode = substr($pcode, 0, 4);
      }else{
        $pcode = substr($pcode, 0, strlen($pcode)-3);
      }
      return $pcode;
   }else{
      return $pcode;
   }
}
Jon
  • 3,174
  • 11
  • 39
  • 57
1

This would do the job:

Note: this is a simplified postcode regex - there are better ones for validating more fully

function getOutwardPostcodePart($postcode) {

    $matches = array();

    if (preg_match("/^([a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1}) {0,1}([0-9][A-Za-z]{2}){0,1}$/", $postcode, $matches )) {

        return $matches[1];
    } 

    return false;
}

I don't think there's anyway to handle the unlikely situation in which a valid outward postcode part is entered with only a partial inward part.

Dan
  • 2,212
  • 20
  • 29
  • Damnit, beaten to the punch by smycbean. I'll leave it in here as the regex I used is slightly different so may be useful. – Dan May 03 '13 at 15:56
1

I found this version works for me, which caters for upper and lower case entries as well as the Central London postcode format:

<?php
    $postcode = "BA12 1AB";
    if (preg_match("(([A-Za-z]{1,2}[0-9]{1,2})($|[ 0-9]))", trim($postcode), $match)) {     //  Caters for BA12 1AB and B1 2AB postcode formats
        $region=$match[1];
    } elseif (preg_match("(([A-Za-z]{1,2}[0-9]{1,2}[A-Za-z]{1})($|[ 0-9]))", trim($postcode), $match)) {        //  Caters for EC1M 1AB London postcode formats
        $region=$match[1];
    } else {
        $region="UK";
    }
    echo $region;
?>
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103