6

I am trying to use regular expressions to find a UK postcode within a string.

I have got the regular expression working inside RegexBuddy, see below:

\b[A-Z]{1,2}[0-9][A-Z0-9]? [0-9][ABD-HJLNP-UW-Z]{2}\b

I have a bunch of addresses and want to grab the postcode from them, example below:

123 Some Road Name
Town, City
County
PA23 6NH

How would I go about this in Python? I am aware of the re module for Python but I am struggling to get it working.

Cheers

Eef

Chilledrat
  • 2,593
  • 3
  • 28
  • 38
RailsSon
  • 19,897
  • 31
  • 82
  • 105
  • You should check: http://www.govtalk.gov.uk/gdsc/schemas/bs7666-v2-0.xsd Especially "(GIR 0AA)|((([A-Z-[QVX]][0-9][0-9]?)|(([A-Z-[QVX]][A-Z-[IJZ]][0-9][0-9]?)|(([A-Z-[QVX]][0-9][A-HJKSTUW])|([A-Z-[QVX]][A-Z-[IJZ]][0-9][ABEHMNPRVWXY])))) [0-9][A-Z-[CIKMOV]]{2})" for a standard regex – nicodemus13 Dec 18 '08 at 15:25

3 Answers3

10

repeating your address 3 times with postcode PA23 6NH, PA2 6NH and PA2Q 6NH as test for you pattern and using the regex from wikipedia against yours, the code is..

import re

s="123 Some Road Name\nTown, City\nCounty\nPA23 6NH\n123 Some Road Name\nTown, City"\
    "County\nPA2 6NH\n123 Some Road Name\nTown, City\nCounty\nPA2Q 6NH"

#custom                                                                                                                                               
print re.findall(r'\b[A-Z]{1,2}[0-9][A-Z0-9]? [0-9][ABD-HJLNP-UW-Z]{2}\b', s)

#regex from #http://en.wikipedia.orgwikiUK_postcodes#Validation                                                                                            
print re.findall(r'[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}', s)

the result is

['PA23 6NH', 'PA2 6NH', 'PA2Q 6NH']
['PA23 6NH', 'PA2 6NH', 'PA2Q 6NH']

both the regex's give the same result.

JV.
  • 2,658
  • 4
  • 24
  • 36
  • Since I value clarity I'd modify the regex to be: '[A-Z]{1,2}[\dR][\dA-Z]? \d[A-Z]{2}' (\d instead of [0-9], if you mean "a digit", better say so directly.) – PEZ Dec 18 '08 at 16:08
  • Actually i just took the regex "as-is" from the question. Since the question was on using re module in python and not on regex specifically, I decided against tinkering with the regex in question. – JV. Dec 18 '08 at 16:23
0

Try

import re
re.findall("[A-Z]{1,2}[0-9][A-Z0-9]? [0-9][ABD-HJLNP-UW-Z]{2}", x)

You don't need the \b.

kris
  • 23,024
  • 10
  • 70
  • 79
0
#!/usr/bin/env python

import re

ADDRESS="""123 Some Road Name
Town, City
County
PA23 6NH"""

reobj = re.compile(r'(\b[A-Z]{1,2}[0-9][A-Z0-9]? [0-9][ABD-HJLNP-UW-Z]{2}\b)')
matchobj = reobj.search(ADDRESS)
if matchobj:
    print matchobj.group(1)

Example output:

[user@host]$ python uk_postcode.py 
PA23 6NH
Jay
  • 41,768
  • 14
  • 66
  • 83