1

I'm trying to get the first part of a UK postcode from a string that may have only the first part of the postcode or the full postcode in it. I'm struggling to make it work. I've got it working if the full postcode is entered by using a look-ahead, but I can't seem to make the look-ahead optional, so if only the first part of the postcode is entered it is matched.

My regex so far is ([A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]([0-9ABEHMNPRV-Y])?)|[0-9][A-HJKPS-UW])(?=( ?[0-9][ABD-HJLNP-UW-Z]{2})))

I've got several postcodes that must match and these are the results using the above regex:

A10EA     - Should match and does
A1        - Should match but doesn't
A10 0EA   - Should match and does
A10       - Should match but doesn't
BH18 1AE  - Should match and does
BH18AE    - Should match and does
EC1M 6HJ  - Should match and does
EC1M      - Should match but doesn't
Z10 2EV   - Shouldn't match and doesn't
QE3 6DA   - Shouldn't match but matches E3 6DA

Can someone please help me solve this issue?

The RegEx I've been working from is the official one from the post office:

/^(GIR ?0AA|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]([0-9ABEHMNPRV-Y])?)|[0-9][A-HJKPS-UW]) ?[0-9][ABD-HJLNP-UW-Z]{2})$/i

Before anyone flags this as a duplicate of PHP Find first part of UK postcode when full or part can be entered, it's not. The answer for that question doesn't work, see my comment to the answer.

Community
  • 1
  • 1
Styphon
  • 10,304
  • 9
  • 52
  • 86
  • Could you describe the structure of UK postcodes? – rr- Apr 02 '15 at 10:04
  • I've added the official RegEx supplied by the post office for UK postcodes, I'm not sure how else to describe the structure. – Styphon Apr 02 '15 at 10:06
  • I found [this](https://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Validation) but I'm not sure if it's relevant. Chances are you're better off not using a regex and writing a simple function for this, since it's pretty complicated already. – rr- Apr 02 '15 at 10:07
  • Also if all you want is to prevent users from entering invalid postcodes, I think simple validation is enough. Some time ago I was unable to buy a phone on my operator's website, because it didn't have name of the street in its database and wouldn't let me proceed because of this. – rr- Apr 02 '15 at 10:12
  • No, I've already go that. I have a database of the first parts of the postcode, I need to get the postcode entered by the user, trim the second half if it exists, then check in the database to get a corresponding ID. – Styphon Apr 02 '15 at 10:13
  • Was just messing around and have this so far: `/^[\w]{1,2}[\d]{1,2}/i`. The results I get are: `A10`, `A1`, `A10`, `A10`, `BH18`, `BH18`, `EC1`, `EC1`, `Z10`, `QE3` respectively for your tests but I'm not sure about the `EC1M` postcodes as I've never seen that format before. If the `M` is allowed afterwards then I suppose you could add `\w?` to the end of the regular expression and that will include the `M` - **bug** that will actually give you `BH18A` for sixth test. – martincarlin87 Apr 02 '15 at 10:13
  • `EC1M` is for areas in London. – Styphon Apr 02 '15 at 10:16
  • @Styphon I probably could have guessed London was the source of the problem... Anyway, I updated my comment and can get the `M` but it breaks the sixth test, not sure how to fix it for that case at the moment. **Edit** just noticed it breaks the first test aswell. – martincarlin87 Apr 02 '15 at 10:17
  • I've been looking at it the wrong way, trying to get the first part of the postcode. It's easy to check for the end of the postcode then if present remove it... – Styphon Apr 02 '15 at 10:33
  • @Styphon which is what my answer suggests... – thecoshman Apr 02 '15 at 10:37
  • Thanks everyone for all your help, I've added my answer for anyone who gets stuck on this in the future. – Styphon Apr 02 '15 at 10:39

2 Answers2

1

According this wiki page the post code always ends in 'digit letter letter', that would be a regex pattern of \d\w\w$. Now we know how to spot what the end is, we just want to capture the rest.

A pattern like (\S*)\s*\d\w\w$ will work. That will capture the first half, and ensure that you do not get the last 'digit letter letter part. It will capture the first part by getting anything not white space, ie only letters and digits.

To fully explain this, the brackets () is what we are capturing. \S says 'any one non white space character, with \S*being all that we can get. so (\S*) captures everything up to a space character, but will capture everything if the user doesn't enter one. The full regex I provided will also try to capture 'any white space, one digit, two letters, end of string' which will ensure that AA999AA is split into AA99 and 9AA.

I've also just noticed though that your question states you might not actually have that second part. I think you could get around that by checking the string length. If you trim white space and the length is less than 5 characters, you must only have the first part, so no need for any regex.


disclaimer this will not work for Anguillan postcodes. To support their postcodes as well, I think (\S*)\s*(?:\d\w\w|-\d{4})$ would work.

thecoshman
  • 8,394
  • 8
  • 55
  • 77
  • I do have a full list of all the first parts, but even with that list `A1` and `A10` are both valid, I need to find a way to know that the first one is `A1` and not `A10`, any regex I've come up with so far always says it's `A10` if there's no space, and I can't guarantee there is a space. – Styphon Apr 02 '15 at 10:15
  • @Styphon complete re-write which should help you out here. – thecoshman Apr 02 '15 at 10:16
  • I'm still stuck with the same problem `?([0-9][ABD-HJLNP-UW-Z]{2})?)` does the same thing, matching an optional space, a number and two letters (but only the valid ones). The question mark at the end makes it optional, but if it is there how can I tell the regex only to select the first half? – Styphon Apr 02 '15 at 10:18
0

I've been looking at this the wrong way. I want to get the first part of the postcode and remove the second part if present, so why not validate the postcode first, then check for an end and strip it if necessary.

I'm already validating the postcode, this is code I already had:

$validate = Validation::factory(array('postcode' => $postcode));
$validate->rule('postcode', 'not_empty');
$validate->rule('postcode', 'regex', array(':value', '/^(GIR ?(0AA)?|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]([0-9ABEHMNPRV-Y])?)|[0-9][A-HJKPS-UW]) ?([0-9][ABD-HJLNP-UW-Z]{2})?)$/i'));
if ( ! $validate->check())
{
    $postcode = '';
}

So now I've added in this after it:

if ($postcode)
{
    $short_postcode = $postcode;
    // Check for an end section and then if present, remove it
    if (preg_match('/ ?([0-9])[ABD-HJLNP-UW-Z]{2})$/i', $postcode, $match, PREG_OFFSET_CAPTURE))
    {
        $short_postcode = substr($postcode, 0, $match[0][1]);
    }
}

and this leaves me with just the first part of the postcode, which is what I wanted. This Eval.in shows it working for all the examples in my question.

Styphon
  • 10,304
  • 9
  • 52
  • 86
  • If you have already validated that your string is a valid post code, then the removal of the second part can be just `\d\w\w` as it's much easier to read. I also think that the formal validation is over engineered, though of course I do no know your needs. Do you just need to validate that it is 'like' a postcode or that it actually IS a real postcode – thecoshman Apr 02 '15 at 10:43
  • You should also be able to both validate and capture (the first part) in just one regex btw. – thecoshman Apr 02 '15 at 10:49
  • @thecoshman I've not been able to just capture the first part, that's why I asked this question in the first place. I can't get it to validate everything properly. And I use the formal validation because it must be a valid postcode, not just in the format of a postcode. – Styphon Apr 02 '15 at 12:03