0

I need a regular expression that would find 100 ABCDEF from input string Suite 400 - 100 ABCDEF. It should be noted that I created a regex as below but it picks the value from Suite.

[^-\s]\d.+
Alex Shesterov
  • 26,085
  • 12
  • 82
  • 103
Ankit
  • 63
  • 1
  • 1
  • 8
  • 3
    Sure, just give us the logic behind how we should match `100 ABCDEF`. – Tim Biegeleisen Sep 13 '18 at 15:19
  • Possible duplicate of [Reference - What does this regex mean?](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean) – Paolo Sep 13 '18 at 15:19
  • Also please tell us which regex tool/language you are using. – Tim Biegeleisen Sep 13 '18 at 15:21
  • Thanks for the quick replies folks and excuse me with my limited knowledge of regex. Basically, i'm extracting OCR data from an unstructured document and it returns a suite/unit number as prefix in front of any address. Basically I have a dictionary which stores street names across North America and I'm trying to find values like: 123 ABC Street 123 ABC Avenue Suite 123 - 123 ABC Rd (In this string, I only intend to match 123 ABC Rd) As far as the street names are concerned, I have a dictionary that matches it. [^-\s]\d.+DictionaryMatch – Ankit Sep 13 '18 at 15:24

2 Answers2

1

Just put $ at the end of your regex. $ means "end of line". Also, replace the dot with [^-], so it will match only non-hyphens:

[^-\s]?\d[^-]+$
Thom A
  • 88,727
  • 11
  • 45
  • 75
Alex Shesterov
  • 26,085
  • 12
  • 82
  • 103
  • Thanks for the reply, it works in most scenarios but I noticed that the following type fails: 5 ABC Bay SW. – Ankit Sep 13 '18 at 15:48
  • Additionally, certain addresses like 1 123 Abc Street also pick 1 123 instead of just 123. – Ankit Sep 13 '18 at 15:56
  • 1
    I thought the hyphen (actually a minus sign) were mandatory. Or, to handle this, just put a `?` after the first character class (which contains the minus sign). This will make it optional. I've updated the answer and the fiddle. – Alex Shesterov Sep 13 '18 at 16:02
  • Thanks alex, your help is much appreciated – Ankit Sep 13 '18 at 16:08
0

Since you're trying to match a US street address, you should try matching a number followed by one or more words instead:

\d+(?:\s+[A-Za-z.]+)+

Demo: https://regex101.com/r/y6n5jD/1

blhsing
  • 91,368
  • 6
  • 71
  • 106