1

I have the following Python script (in Jupyter) which is supposed to extract address information using regex (unit numbers are already cleaned up and street types are abbreviated before this step):

type_opts = r"Terrace|Way|Walk|St|Rd|Ave|Cl|Ct|Cres|Blvd|Dr|Ln|Pl|Sq|Pde"
road_attrs_pattern = r"(?P<rd_no>\w?\d+(\-\d+)?\w?\s+)(?P<rd_nm>[a-zA-z \-]+)(?#\s+(?P<rd_tp>" + type_opts + ")"
print("Road Attr Pattern: ", road_attrs_pattern)
road_attrs = re.match(road_attrs_pattern, proc_addr)
road_num = road_attrs.group('rd_no').strip()
print("Road number: ", road_num)
road_name = road_attrs.group('rd_nm').strip()
print("Road name: ", road_name)
road_type = road_attrs.group('rd_tp').strip()
print("Road type: ", road_type)

I'm using this address:

Burrah lodge, 15 Anne Jameson Pl

This results in the following print-out:

Road Attr Pattern:  (?P<rd_no>\w?\d+(\-\d+)?\w?\s+)(?P<rd_nm>[a-zA-z \-]+)(?#\s+(?P<rd_tp>Terrace|Way|Walk|St|Rd|Ave|Cl|Ct|Cres|Blvd|Dr|Ln|Pl|Sq|Pde)

But then throws an error saying the street number is not available AttributeError: 'NoneType' object has no attribute 'group'.

However a copy-paste in Regex101 here says it should work, and looking over the Regex it's my view that it should work also...

It should print-out the following:

Road Attr Pattern:  (?P<rd_no>\w?\d+(\-\d+)?\w?\s+)(?P<rd_nm>[a-zA-z \-]+)(?#\s+(?P<rd_tp>Terrace|Way|Walk|St|Rd|Ave|Cl|Ct|Cres|Blvd|Dr|Ln|Pl|Sq|Pde)
Road number: 15
Road name: Anne Jameson
Road type: Pl
AER
  • 1,549
  • 19
  • 37

1 Answers1

2

According to the docs, re.match checks for a match at the beginning of the string.

Since you're looking for a match that starts partway through the string, you'll want re.search instead.

ethguo
  • 180
  • 11
  • OK, that's embarrassing, thanks for your help. I'll adjust the question as well to help other people point to it... – AER Nov 15 '19 at 06:34