4

I have a number of restaurants who all deliver to certain postcode areas in London, for example:

  • EC1
  • WC1
  • WC2
  • W1

When someone searches for a restaurant that delivers to their home, they enter their full postcode.

Some people enter the postcode correctly with the space, some of them just enter all letters and numbers attached, without a space separator. To harmonize things, I remove any space in the postcode before attempting a match.

So far, I used to match the postcode to the prefixes by just checking if it starts with the prefix in question, but then I realized that this is not foolproof:

  • WC1E123 => correct match for WC1
  • W1ABC => correct match for W1
  • W10ABC => incorrect match for W1, should only match the W10 prefix

How can I know, given a full postcode with no space, if it matches a given prefix, while not failing the W1 / W10 test above?

Is there any solution at all to the problem, that would not involve forcing the customer to enter the postcode with the space at the correct position?

BenMorel
  • 34,448
  • 50
  • 182
  • 322
  • What language are you using? python? java? ruby? Befunge? Answers are going to depend on that. It's certainly possible to do regardless of what language, but the answer could be vastly different. – Ghost Feb 06 '14 at 23:20
  • @Ghost I'm using PHP, but it really doesn't matter, I'm looking for a pseudo-code, or even just the idea to get it right, and the code will follow. – BenMorel Feb 06 '14 at 23:25
  • Have a look at this one http://www.braemoor.co.uk/software/postcodes.shtml. Also I would vote to close the question as duplicate of http://stackoverflow.com/q/164979/1328439 – Dima Chubarov Feb 06 '14 at 23:27
  • @DmitriChubarov My question has nothing to do with this one, which I've read before. His question is about *validating* postcodes, while my question is about *matching* postcodes to prefixes. Two very different things. Same about your link, which is about validating postcodes. – BenMorel Feb 06 '14 at 23:29
  • Fair enough. Could you post the regex that you are currently using to match the prefix. It seems to me that the official regex would need a minor modification to make it match the prefix and suffix separately. Also I'd keep my previous comment since that question seems relevant. Would be good to keep a link to it here. – Dima Chubarov Feb 06 '14 at 23:38
  • 2
    Is there some reason that you can't place the test for W10 _before_ the test for W1? That's the usual way to deal with cases where one prefix is a subset of another... test the longest one first. – Phil Perry Feb 06 '14 at 23:39
  • The easiest way to do it would be to separate the input into two separate groups, one for the first portion (before where the space would go) and one for the second portion, and then validate the first portion against your accepted delivery codes. This eliminates all of the difficulty entirely for both you and the site user. – Ken White Feb 07 '14 at 00:13
  • Can you have database of all codes? If yes then you can go ahead with querying database until you find single correct match or can take 1st from list in case of no more match. Like if user enter W1ABC then you can start querying database with W,for that you will get multiple records,then W1 you will get multiple records like W1,W10,W11 but for W1A you wont get any record so in that case 1st record from W1,W10,W11 means W1 is your answer.Now for W10ABC by following same pattern you will get one record for W10 so thats your answer.I don't know it will cover all cases or not but think on it once. – cjd Feb 20 '14 at 06:24
  • @cjd UK postcodes databases are huge, and can change with time, so it's not really a viable solution for my use case unfortunately! – BenMorel Feb 20 '14 at 09:22
  • ...chop off the last 3 characters? – gvee Feb 25 '14 at 13:47

6 Answers6

16

There are 6 possible formats for postcodes in the UK:

  • A9 9AA
  • A9A 9AA
  • A99 9AA
  • AA9 9AA
  • AA9A 9AA
  • AA99 9AA

I think there need to be two parts to your solution. The first is to validate the input; the second is to grab that first part.

Validation

This is really important, even though I realise you have said this is not what you are trying to do, but without it you are going to struggle to get the right prefix and possibly send your drivers to the wrong place!

There are a couple of ways you can do it, either use a 3rd party to help you capture a complete & correct address (many available including http://www.qas.co.uk/knowledge-centre/product-information/address-postcode-finder.htm (my company)), or at a minimum use some reg-ex / similar sanity testing to validate the postcodes - such as the links Dmitri gave you above.

If you look at the test cases you have listed - W1ABC and W10ABC are not valid postcodes - if we get that bit correct then the next bit becomes a lot easier.

Extract the Prefix

Assuming you now have a full, valid postcode getting just the first part (outcode) becomes a lot easier - with or without spaces. Because the second half (incode) has a standard format of 9AA, digit-alpha-alpha, I would do it by spotting and removing this, leaving you with just your outcode whether it be W1 From W1 0AA, or W10 from W10 0AA.

Alternatively, if you are using a 3rd party to capture the address - most of them will be able to return the incode and outcode separately for you.

Al Mills
  • 1,072
  • 6
  • 22
2

The below graphic explains the format of UK postcodes:

Format of UK postcodes

Source: https://www.getthedata.com/postcode (My site) So you can see that you need Outcode which given your requirement (given a full postcode with no space) is simply your space-free postcode minus the last three characters.

In PHP this would be:

$outcode = substr($postcode_no_space, 0, -3)

Of course this does not help with validating the postcode, but as you point out in your comments the question is not about validation.

Dan Winchester
  • 324
  • 1
  • 7
  • Thanks for the answer and the graphic, better late than never :) That being said, my original requirement at the time was also to be able to match based on incomplete prefixes as well. When I was living in London, people would say "I live in WC2", even though `WC2` is not a valid prefix; `WC2E` is. So I wanted to be able to match such incomplete postcode prefixes, without falling into the `W1` - `W11` trap. – BenMorel Jun 01 '16 at 21:09
  • Aha in that case you might need an algorithm rather than a single rule. Say you have a formal outcode $outcode and an informal prefix $prefix (e.g. WC2) then you first check for an exact match $outcode==$prefix, if that fails then remove the final alpha character from $outcode (WC2E becomes WC2) and test if that matches $prefix. – Dan Winchester Jun 03 '16 at 11:04
1

I use the following regex which matches the prefix part only but uses a lookahead to make sure the full postcode is valid (including an optional space)

(GIR|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]|[A-HK-Y][0-9]([0-9]|[ABEHMNPRV-Y]))|[0-9][A-HJKS-UW]))(?=( )?[0-9][ABD-HJLNP-UW-Z]{2})

It's not quite perfect as it will match some postcodes that aren't valid (eg starting AA, etc) but if you're using it to look up the prefix anyway it should do the trick.

ps. I just noted that the regex supplied by the UK Government has been updated since I first implemented this. I which case this can be updated to:

(GIR|([A-Z-[QVX][0-9][0-9]?)|(([A-Z-[QVX][A-Z-[IJZ][0-9][0-9]?)|(([A-Z-[QVX][0-9][A-HJKSTUW])|([A-Z-[QVX][A-Z-[IJZ][0-9][ABEHMNPRVWXY]))))(?=( )?[0-9][A-Z-[CIKMOV]{2})
Kevin Owen
  • 96
  • 4
1

In php I do

$first=trim(substr(trim($postcode),0,-3));

To get the first section of the postcode. I've been using it for years and just works. It doesn't matter whether the user includes the space (or 2 spaces) in middle, because the last section is always 3 characters. I work for a distribution company, and we get charged more for certain postcode areas. You will have a problem is somebody enters their postcode incorrectly, if they miss a character from the end.

If the above isn't good enough.

You can validate whether the postcode the user gave you is valid, then http://postcodes.io/ can help.

http://api.postcodes.io/postcodes/W11%202AQ will give you back some JSON with whether the postcode is valid.

{
    "status": 200,
    "result": {
        "postcode": "W11 2AQ",
        "quality": 1,
        "eastings": 524990,
        "northings": 181250,
        "country": "England",
        "nhs_ha": "London",
        "longitude": -0.200056238526337,
        "latitude": 51.5163540527233,
        "parliamentary_constituency": "Kensington",
        "european_electoral_region": "London",
        "primary_care_trust": "Kensington and Chelsea",
        "region": "London",
        "lsoa": "Kensington and Chelsea 004A",
        "msoa": "Kensington and Chelsea 004",
        "nuts": "Colville",
        "incode": "2AQ",
        "outcode": "W11",
        "admin_district": "Kensington and Chelsea",
        "parish": "Kensington and Chelsea, unparished area",
        "admin_county": null,
        "admin_ward": "Colville",
        "ccg": "NHS West London (Kensington and Chelsea, Queenís Park and Paddington)",
        "codes": {
            "admin_district": "E09000020",
            "admin_county": "E99999999",
            "admin_ward": "E05009392",
            "parish": "E43000210",
            "ccg": "E38000202"
        }
    }
}

Part of the JSON is an "outcode": "W11", which I think is exactly what you are looking for.

You could also use the "eastings":524990,"northings":181250, fields to calculate the straight line distance from the restaurant to the user. The units are metres. Use Pythagoras.

Tim Bray
  • 1,373
  • 10
  • 7
0

Since you can compute the length of the postcode the customer entered, and the formats for the postcodes always have 9AA at the end, you could break the code down into a few cases and return matches by doing the following

firstPart -> postcode with last 3 characters removed
firstPartLength -> length of firstPart
switch (firstPartLength){
    case 2:
        code to compare prefix against A99AA format
    case 3:
        code to compare prefix against A9A9AA, A999AA, AA99AA format
    case 4:
        code to compare prefix against AA999AA format

or if you don't want to truncate the last 3 characters,

length -> length of postcode
switch (length){
    case 5:
        code to compare prefix against A99AA format
    case 6:
        code to compare prefix against A9A9AA, A999AA, AA99AA format
    case 7:
        code to compare prefix against AA999AA format
Josh Durham
  • 1,632
  • 1
  • 17
  • 28
  • You haven't accepted an answer yet. Is there something more you're looking for in an answer that we could help you with? – Josh Durham Feb 20 '14 at 19:44
  • I just started a bounty even though I already had a potentially satisfactory answer, precisely to get more people to throw in their ideas/comments, so even though I don't have any more questions at the moment, I'll leave it up to the end just to give it the best chances of having, if not further answers, maybe comments/votes on existing answers! – BenMorel Feb 20 '14 at 20:36
  • Sounds good! I just wanted to make sure we weren't leaving out something you needed. – Josh Durham Feb 20 '14 at 20:41
0

Given the assumption that every postcode ends in 9AA and every input postcode is valid, the following regex could be used to match the area prefix:

^(\w{2,4})\s*[0-9][a-zA-Z]{2}$

The first capturing group returns the wanted prefix.

Max Fichtelmann
  • 3,366
  • 1
  • 22
  • 27