5

I have an Asp.Net website and I want to use a RegularExpressionValidator to check if a UK postcode is English (i.e. it's not Scottish, Welsh or N.Irish).

It should be possible to see if the postcode is English by using just the letters from the first segmant (called the Postcode Area). In total there are 124 postcode areas and this is a list of them.

From that list, the following postcode areas are not in England.

  • ZE,KW,IV,HS,PH,AB,DD,PA,FK,G,KY,KA,DG,TD,EH,ML (Scotland)
  • LL,SY,LD,HR,NP,CF,SA (Wales)
  • BT (N.Ireland)

The input to the regex may be the whole postcode, or it might just be the postcode area.

Can anyone help me create a regular expression that will match only if a given postcode is English?

EDIT - Solution

With help from several posters I was able to create the following regex which i've tested against over 1500 testcases successfully.

^(AL|B|B[ABDHLNRS]|C[ABHMORTVW]|D[AEHLNTY]|E|E[CNX]|FY|G[LUY]|H[ADGPUX]|I[GM‌​P]‌​ |JE|KT|L|L[AENSU]|M|ME|N|N[EGNRW]|O[LX]|P[ELOR]|R[GHM]|S|S[EGKLMNOPRSTW]|T[AFNQ‌​‌​ RSW]|UB|W|W[ACDFNRSV]|YO)\d{1,2}\s?(\d[\w]{2})?

Robbie
  • 18,750
  • 4
  • 41
  • 45
  • Also how many codes are there for Scotland, Wales, and N. Ireland? Because it may be easier to match for negatives than positives depending on the numbers. – Hersha Mar 07 '12 at 20:08
  • @Hersha Yeah, i was planning on doing the negatives – Robbie Mar 07 '12 at 20:14
  • 1
    MK, Milton Keynes is missing from your regex http://en.wikipedia.org/wiki/List_of_postcode_areas_in_the_United_Kingdom. FYI to other people "GY" for Guernsey, "JE" for Jersey, and "IM" for Isle of Man are included in the regex. This might be fine but if you just what mainland england you'll have to remove these Crown dependencies. – Neil Sep 24 '13 at 14:00

6 Answers6

10

I've already answered once, making the point that it's not possible to come up with a 100% correct England-only regex (since the postcode areas don't lie along political boundaries).

However I've dug a bit deeper into this, and ... well it is possible, but it's a lot of work.

To verify an England-only postcode, you need to exclude the non-English postcodes. The easy ones are:

  • BT (Northern Ireland)
  • IM (Isle of Man)
  • JE (Jersey)
  • GG (Guernsey)
  • BF (British Forces)
  • BX (non-geographic UK postcodes)
  • GIR (Girobank, which is also non-geographic)

(I'm not going to mention UK-style postcodes for territories outside the UK, like St Helena, Gibraltar etc. Technically speaking, the Isle of Man and Channel Islands aren't part of the UK either, but they're much nearer by, and more closely tied into the Royal Mail system in the UK.)

The purely Scottish postcode areas are (as you mentioned):

ZE,KW,IV,HS,PH,AB,DD,PA,FK,G,KY,KA,EH,ML

DG and TD are nominally Scottish, and are for the most part in Scotland. However some areas extend over the Scotland-England border as follows:

  • DG16 - a tiny bit in England
  • TD9 - a tiny bit in England
  • TD12 - half in England
  • TD15 - mostly in England

The breakdown is as follows:

DG16 is in Scotland except for the following English postcodes:

  • DG16 5H[TUZ]
  • DG16 5J[AB]

TD9 is in Scotland except for TD9 0T[JPRSTUW]

TD12 has only one sector (TD12 4), which is spread roughly half and half across England and Scotland:

  • TD12 4[ABDEHJLN] are in Scotland
  • TD12 4[QRSTUWX] are in England

TD15 is the most complicated. There are 3 sectors, of which TD15 2 and TD15 9 are entirely in England.

TD15 1 is split across England and Scotland.

Postcodes beginning as follows are in Scotland:

  • TD15 1T
  • TD15 1X

... except for these English postcodes:

  • TD15 1T[ABQUX]
  • TD15 1XX

All other postcodes in TD15 1 are in England, except for those beginning as follows:

  • TD15 1B
  • TD15 1S (i.e. TD15 1S[ABEJLNPWXY])
  • TD15 1U (i.e. TD15 1U[BDENPQRTUXY])

... which are all in England, with the exception of the following postcodes which are in Scotland:

  • TD15 1BT
  • TD15 1S[UZ]
  • TD15 1U[FGHJLSZ]

The English postcode areas CA and NE lie on the other side of the England-Scotland border, however they never extend into Scotland.

In fact, the last two letters of a UK postcode is based on how the postman actually delivers post (as far as I'm aware), so it's not given for granted that it will fall inside a political boundary. Thus if there's a group of houses which straddle the border, then it's possible that the entire postcode (i.e. at the most fine-grained level) does not lie entirely within either England or Scotland. E.g. TD9 0TJ and TD15 1UZ are very close to the border, and I don't really know for sure if they're entirely on one side or not.

The England-Wales border is also complicated, however I'll leave this as an exercise for the reader.

jim
  • 101
  • 1
  • 2
  • Hi jim. Nearly 4 years on and I found this really useful. Thanks so much for your time and effort here. Quick question - is there a typo in this portion: `TD12 4[ABDEHJLN] are in Scotland | TD12 4[QRSTUWX] are in England`? – michaelmcgurk Apr 27 '17 at 16:50
4

There are 124 Postcode Areas in the UK.

-- PAF® statistics August 2012, via List of postcodes in the United Kingdom (Wikipedia).

I recommend breaking your problem down into two parts (think functions):

  1. Is the postcode valid?

    UK Postcode Regex (Comprehensive)

  2. Is the postcode English?

    This can be broken down further:

    • Not Scottish:
      • ! /^(ZE|KW|IV|HS|PH|AB|DD|PA|FK|G|KY|KA|DG|TD|EH|ML)[0-9]/
    • Not Welsh:
      • ! /^(LL|SY|LD|HR|NP|CF|SA)[0-9]/
    • Not Northern Irish, Manx, from the Channel Islands, ...
      • et cetera...
    • or you could just check that the Postcode Area is among the hundred or so English ones, depending on how you want to optimise ☻

Note that the syntax will vary according to your programming language. Doing all this in one regular expression would soon become unmanageable.

Community
  • 1
  • 1
johnsyweb
  • 136,902
  • 23
  • 188
  • 247
  • When i first saw this, i thought it wasn't right, but then (i think) you edited it and added the [0-9] at the end and that has changed everything. That fixes the problem of matching chars from the remaining input. Your fourth point about using only English postcodes is probably the way to go (although i thought the opposite originally). It's a slightly longer regex, but it saves me from having to worry about other random places like Gibraltar, Channel Islands, etc... so, the credit for this one is yours :) – Robbie Mar 08 '12 at 19:49
  • i think this works ^(AL|B|B[ABDHLNRS]|C[ABHMORTVW]|D[AEHLNTY]|E|E[CNX]|FY|G[LUY]|H[ADGPUX]|I[GM‌​P]|JE|KT|L|L[AENSU]|M|ME|N|N[EGNRW]|O[LX]|P[ELOR]|R[GHM]|S|S[EGKLMNOPRSTW]|T[AFNQ‌​RSW]|UB|W|W[ACDFNRSV]|YO)\d{1,2}\s?(\d[\w]{2})? – Robbie Mar 10 '12 at 21:45
  • Link to cabinet office is broken. – Neil Sep 17 '13 at 13:27
  • 1
    @Neil: I've replaced the dead link. Feel free to edit answers if you spot dead links. – johnsyweb Sep 17 '13 at 22:41
3

It's not possible to come up with an England-only regex, because the postcode areas don't lie along political boundaries, at least not at the postcode area or district level.

For example, CH1 is in England, and CH5 is in Wales.

At the postcode district level there are still problems, for example TD12 is half in England, half in Scotland.

The only area which you can rely on is BT (Northern Ireland)

jim
  • 31
  • 1
  • You're right that it's not possible to be 100% correct due to the boundaries, but the solution i posted in the question edit was accurate enough for my clients needs. – Robbie Nov 30 '12 at 12:13
1

These are the RegEx i put together that follows the Royal Mail defined standards for all UK postcode types:

Standard UK PostCodes:

/^([A-PR-UWYZ](?:[0-9]{1,2}|[0-9][A-HJKMNPR-Y]|[A-HK-Y][0-9]{1,2}|[A-HK-Y][0-9][ABEHMNPRVWXY]))\s*([0-9][ABD-HJLNP-UW-Z]{2})$/i

GiroBank PostCodes:

/^(GIR)\s*(0AA)$/i

UK Overseas Territories:

/^([A-Z]{4})\s*(1ZZ)$/i

British Forces Post Office:

/^(BFPO)\s*(?:(c\/o)\s*)?((?(2)[0-9]{1,3}|[0-9]{1,4}))$/i

And this is the function I wrote which validates a postcode against these four types and allows type detection:

public function UKPostCode(&$strPostCode, &$strError = null, &$strType = null, $ReturnFormatted = true) {
    $strStrippedPostCode = preg_replace("/[\s\-]/i", "", $strPostCode);

    if (empty($strStrippedPostCode)) {
        $strError = $this->__getErrorMessage("Post", "EMPTY_POST");
        return false;

    }

    $arrRegExp = array(
        "STD" => "/^([A-PR-UWYZ](?:[0-9]{1,2}|[0-9][A-HJKMNPR-Y]|[A-HK-Y][0-9]{1,2}|[A-HK-Y][0-9][ABEHMNPRVWXY]))\s*([0-9][ABD-HJLNP-UW-Z]{2})$/i",
        "GIR" => "/^(GIR)\s*(0AA)$/i",
        "OST" => "/^([A-Z]{4})\s*(1ZZ)$/i",
        "BFPO" => "/^(BFPO)\s*(?:(c\/o)\s*)?((?(2)[0-9]{1,3}|[0-9]{1,4}))$/i"
    );

    foreach ($arrRegExp as $strPostCodeType => $strExpression) {

        if (preg_match($strExpression, $strPostCode, $arrMatches)) {

            if ($ReturnFormatted !== null) {
                array_shift($arrMatches);
                $strPostCode = implode(" ", array_filter($arrMatches));
                $strPostCode = ((bool)$ReturnFormatted === true) ? strtoupper($strPostCode) : strtolower($strPostCode);

            }

            $strType = $strPostCodeType;
            return true;

        }

    }

    $strError = $this->__getErrorMessage("Post", "INVALID_POST");
    return false;

}

Hope this helps

Seth
  • 121
  • 2
  • 2
  • Nothing here that answers the actual question regarding England only postcodes, but useful as a resource, so thanks for contributing. I'll give you your first 10 points - welcome to SO :) – Robbie Jul 19 '13 at 10:35
1

Use ^(AB|AL|B| ... )$, where the ... is where you fill the rest of the valid ones in, separated by pipes (|).

EDIT: There's a boatload of information here: http://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom

If you were to include the in/out codes, it would be something like ^(AB|AL|B| ... )([\d\w]{3})\s([\d\w]{3})$, which would get the rest of the code.

EDIT

^(A[BL]|B[ABDHLNRST]?|C[ABFHMORTVW]|D[ADEGHLNTY]|E[CNX]?|F[KY]|G[LUY]|H[ADGPRSUX]|I[GMPV]|JE|K[ATWY]|L[ADELNSU]?|M[EL]?|N[EGNPRW]?|O[LX]|P[AEHLOR]|R[GHM]|S[AEGKLMNOPRSTWY]?|T[AFNQRSW]|UB|W[ACDFNRSV]?|YO|ZE)([\w\d]{1,2})\s?([\w\d]{3})$

Part of this regex is taken from another one of the answers. It matches the valid postcodes, then 1 to 2 {1,2} letters \w or numbers \d, an optional space \s?, then 3 letters or numbers. Hope that helps.

Derreck Dean
  • 3,708
  • 1
  • 26
  • 45
  • Yeah, this is what i thought initially, but won't that incorrectly exclude valid postcodes that contain those characters in other places. For example G is Glasgow, but there are valid English postcodes that contain the letter G - for example GL – Robbie Mar 07 '12 at 20:17
  • That's what the ^ and $ are for - it forces it to match the whole string rather than just a part. – Derreck Dean Mar 08 '12 at 19:59
  • I tried to flip this around and put in the valid english postcodes. I removed the ^ as i want it to match them and tried it with several samples. It looked like this "(AL|B|BA|... rest of the valid codes)$" It didn't seem to work. For example HG2 8EH does not match, but it should - did i misinterpret how to do this? – Robbie Mar 08 '12 at 20:04
  • I really like this solution because it validates both the correctness of the postcode structure as well as the Englishness of it. The only problem (for me) is that it requires the whole postcode to be entered (and doesn't handle just Postcode Area inputs). I appreciate that it may be of use to others who do need the full postcode validating and so i've up voted it. – Robbie Mar 09 '12 at 22:50
  • I've commented in the accepted answer showing how i changed your suggestion so that it works with partial and full postcodes and with a white list instead of black list – Robbie Mar 10 '12 at 04:00
0
'A[BL]|B[ABDHLNRST]?|C[ABFHMORTVW]|D[ADEGHLNTY]|E[CNX]?|F[KY]|G[LUY]|H[ADGPRSUX]|I[GMPV]|JE|K[ATWY]|L[ADELNSU]?|M[EL]?|N[EGNPRW]?|O[LX]|P[AEHLOR]|R[GHM]|S[AEGKLMNOPRSTWY]?|T[AFNQRSW]|UB|W[ACDFNRSV]?|YO|ZE'
bluepnume
  • 16,460
  • 8
  • 38
  • 48
  • I've tried this regex and it returns matches for quite a few non English postcodes... for example G11 5EH (glasgow - its matching on the 'E'). Anything that contains any of the single letter outcodes in the incode is matched with this regex.... this is the same issue i have mentioned on Derreck Dean's answer – Robbie Mar 08 '12 at 19:28