2

I've looked and tried the solutions of previous questions on this topic (here and here), but I can't get it to work.

I'm looking for a regex for the outer part of a UK postcode. In "PO1 1AF", PO1 is the outward postcode or postcode district, with 1AF being the inward postcode. I have a long list of urls, some of which have an outer post code at the end of them.

E.g, I want "ab15" and "dd9" from these two strings:

string1= "www.xyz.com/abcdab15/"
string2 = "www.xyz.com/adbdd9"

The permutations for the outer post code are:

A9
A9A
A99
AA9
AA9A
AA99

I tried this solution from a previous answer, which is meant to match either the inner, outer or both but it doesn't return anything (the answer was for capitalised letters):

exp = '^((([A-PR-UWYZ][0-9])|([A-PR-UWYZ][0-9][0-9])|([A-PR-UWYZ][A-HK-Y][0-9])|([A-PR-UWYZ][A-HK-Y][0-9][0-9])|([A-PR-UWYZ][0-9][A-HJKSTUW])|([A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRVWXY]))) || ^((GIR)[ ]?(0AA))$|^(([A-PR-UWYZ][0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))$|^(([A-PR-UWYZ][0-9][0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))$|^(([A-PR-UWYZ][A-HK-Y0-9][0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))$|^(([A-PR-UWYZ][A-HK-Y0-9][0-9][0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))$|^(([A-PR-UWYZ][0-9][A-HJKS-UW0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))$|^(([A-PR-UWYZ][A-HK-Y0-9][0-9][ABEHMNPRVWXY0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))$'

import re

url1= "www.xyz.com/abcdAB15/"
url2 = "www.xyz.com/adbDD9"

postalCode = re.findall(exp,url1)
print postalCode[0]

Here is the expression free of all $ and ^ anchors as suggested below:

exp = '((([A-PR-UWYZ][0-9])|([A-PR-UWYZ][0-9][0-9])|([A-PR-UWYZ][A-HK-Y][0-9])|([A-PR-UWYZ][A-HK-Y][0-9][0-9])|([A-PR-UWYZ][0-9][A-HJKSTUW])|([A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRVWXY]))) || ((GIR)[ ]?(0AA))|(([A-PR-UWYZ][0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))|(([A-PR-UWYZ][0-9][0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))|(([A-PR-UWYZ][A-HK-Y0-9][0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))|(([A-PR-UWYZ][A-HK-Y0-9][0-9][0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))|(([A-PR-UWYZ][0-9][A-HJKS-UW0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))|(([A-PR-UWYZ][A-HK-Y0-9][0-9][ABEHMNPRVWXY0-9])[ ]?([0-9][ABD-HJLNPQ-UW-Z]{0,2}))'
Community
  • 1
  • 1
eamon1234
  • 1,555
  • 3
  • 19
  • 38

2 Answers2

2

Given the possibilities you list for an outer postcode, it can be described as:

  • one or two letters
  • followed by a digit
  • optionally followed by a letter or digit

Which, in regex terms, is this:

[a-z]{1,2}[0-9][a-z0-9]?

... but you only want to find that pattern at the end of the URL (possibly followed by a slash), so we'll add a lookahead:

[a-z]{1,2}[0-9][a-z0-9]?(?=/?$)

The full-postcode regex in your question incorporates a number of different exclusions. For example, it looks like V, Q and X aren't allowed in some places, and there are apparently other limitations - I won't bother to try to replicate those (reading other people's regexes is never fun) ... but using what we have:

>>> import re
>>> postcode = re.compile("[a-z]{1,2}[0-9][a-z0-9]?(?=/?$)")
>>> string1= "www.xyz.com/abcdab15/"
>>> string2 = "www.xyz.com/adbdd9"
>>> re.findall(postcode, string1)
['ab15']
>>> re.findall(postcode, string2)
['dd9']
Zero Piraeus
  • 56,143
  • 27
  • 150
  • 160
1

The problem is the ^$ anchors, which anchor the start and end of the string respectively, meaning that the regex will only match an entire string. Remove them from each alternation (exp split on |) and it will work.

ecatmur
  • 152,476
  • 27
  • 293
  • 366
  • So remove all ^ and $ from the expression? I've posted that to the article description but it still isn't giving me the postcodes. – eamon1234 Nov 30 '12 at 16:22
  • @user578582 the ` || ` in the middle of the expression looks incorrect; it should just be another `|`. – ecatmur Nov 30 '12 at 16:30
  • Ah, right you are. However that solution gives a big result like: ('', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'AB15', 'AB1', '5', '', '', '', '', '', '', '', '', ''). The answer above does the trick I think but thanks for your help. – eamon1234 Nov 30 '12 at 16:33