0

I want my regular expression to be able to recognize a street address that ends in a zip code and starts with a number.

So if my sample string is

'abcd 123 abcd 1600 Penn Ave. Washington D.C. 12345 hello, world'

I want it to match only

1600 Penn Ave. Washington D.C. 12345

I'm stuck on using

.match(/\d+.*\d{5}/)

but this returns

123 abcd 1600 Penn Ave. Washington D.C. 12345

How can I get it to return the closest instance of numbers?

redcup
  • 67
  • 5
  • 2
    Regex is not intelligent enough to guess what your address is. It will give you the string that matches your pattern. – Rohit Jain Jan 21 '13 at 22:23
  • 1
    I might try this: http://smartystreets.com/how-to/regex-street-address. I'm guessing you're not going to be able to get all the way there with regex. Address parsing has a lot of complexities that don't immediately meet the eye. – Jason Swett Jan 21 '13 at 22:26
  • Well I guess all I'm wondering is if there is a way for it to match the last instance of \d+ rather than the first – redcup Jan 21 '13 at 22:27

3 Answers3

0

This is also an option for you:

.match(/\d+(\s(\D+|\d+\D{2})){3,6}\d{5}/)

This means:

  1. Look for a group of digits
  2. Make sure it's followed by between 3 and 6 groups of: (one space + some characters). Those characters can either be non-digits, or combinations of numbers and two letters. The latter type of group, \d+\D{2}, will address bits such as 1st and 3rd, etc. in your address as The Tin Man mentions. But it won't match Apt. 2 correctly.
  3. Giving your groups a number range between 3 and 6, and you can adjust those numbers of course, will make it so that your regex will match addresses that are a little different.
  4. Make sure there is a zip code at the end of the match

P.S. Rubular is your friend.

eeeeeean
  • 1,774
  • 1
  • 17
  • 31
0
.match(/\d+(\D)*?\d{5}/)

I bet above might be what you want. Basically, if you don't want extra digits in between you can use (\D) instead of (.). The additional ? tells the regex interpreter to do reluctant match rather than greedy match. In other words, the interpreter would return the shortest match.

A good question for greedy vs. reluctant.

Community
  • 1
  • 1
Terry Li
  • 16,870
  • 30
  • 89
  • 134
0

The problem with your pattern is that regex are greedy by default. .* is grabbing too much and needs to be told to be more selective. Also, . will grab any type of character, which is probably not what you want.

I'd start with /(\d+\D+?\d{5})/ which captures:

1600 Penn Ave. Washington D.C. 12345

For example:

'a 123 a 1600 Penn Ave. Washington D.C. 12345 foo'[/(\d+\D+?\d{5})/, 1]
=> "1600 Penn Ave. Washington D.C. 12345"

The pattern means:

  1. Start with a minimum of one digit...
  2. Followed by at least one non-digit, selecting the minimal amount to reach to...
  3. A five-digit number.

All answers would probably fail if you get an address that has a numerically named street, like 1st.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303