0

Disclaimer: I know from this answer that regex isn't great for U.S. addresses since they're not regular. However, this is a fairly small project and I want to see if I can reduce the number of false positives.

My challenge is to distinguish (i.e. match) between addresses like "123 SOUTH ST" and "123 SOUTH MAIN ST". The best solution I can come up with is to check if more than 1 word comes after the directional word.

My python regex is of the form:

^(NORTH|SOUTH|EAST|WEST)(\s\S*\s\S*)+$

Explanation:

  • ^(NORTH|SOUTH|EAST|WEST) matches direction at the start of the string
  • (\s\S*\s\S*)+$ attempts to match a space, a word of any length, another space, and another word of any length 1 or more times

But my expression doesn't seem to distinguish between the 2 types of term. Where's my error (besides using regex for U.S. addresses)?

Thanks for your help.

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • Do you need the capture group values? This part `(\s\S*\s\S*)+` will repeat the capture group, and in Python `re` will capture the value of the last iteration. Also the `\S*` is optional and will also match just 2 whitespace chars (that can also match 2 newlines) – The fourth bird Feb 17 '22 at 20:35

1 Answers1

0

Your regex misses number in beginning of the address and treats optional word (MAIN in this case) as mandatory. Try this

^\d+ (NORTH|SOUTH|EAST|WEST)((\s\S*)?\s\S*)+$

Omut
  • 1
  • 1
  • That's it - thank you! Regex was my intro to scripting and I always appreciate feedback. – wakanda_official_tourism Feb 17 '22 at 18:56
  • Hi! It looks like this is your fist answer. I just wanted to let you know that you can surround your code examples with triple-backticks (```) to display them nicely, with syntax highlighting. Thanks for contributing! – bastien girschig Feb 17 '22 at 20:03
  • If you are already repeating the whole construct of `\s\S*` in the group, having an optional preceding one which does the same `(\s\S*)?` seems unnecessary. – The fourth bird Feb 17 '22 at 20:33