2

So I've been working for three days straight with a PHP script that grabs various bank information from across the States. Everything single value I'm pulling works except the start of the address.

This doesn't have to be perfect and I'm scraping the > and < symbols to make it easier. These are examples of what I need to match. I have code written to strip off the greater than and lesser than characters after the fact - notice I'm only interested in addresses that end with: Way, Street, St., St, Avenue, Ave, Ave., Road, Rd, Rd., Highway, Hwy, Hwy, Boulevard, Bvd, Bvd., Crescent, Cres., Cres, etc.

         >20 Cross Street<
         >1 Dillinger Avenue<
         >189 Beautiful Way<
         >5768 Some Longer Address That Is Crazy Like Ave.<
         >857489 Monkey On My Back Highway<
         >378 My Pants Are Ablaze Boulevard<

Here is what I have so far;

     '~>[0-9]{1-7}.*\s[Street|St.|St|Road|Rd|Rd.]<~'

4 Answers4

1

Escape the dots and replace the dash by {1-7} with a coma {1,7}

[0-9]{1,7}.*\s(?:Street|St\.|St|Road|Rd|Rd\.)
veith
  • 56
  • 5
  • 5 out of 10 failed on the attempts - will look further - thanks for your time ;-) –  Nov 01 '13 at 21:47
0

Well you need to make at least one vital change and several small changes:

'~>[0-9]{1,7}.*\s(?:Street|St\.?Road|Rd\.?)<~'
                 ^^                          ^

In your expression you used a character class, that's wrong because everything inside a character class is taken literally and it is a set of characters, not words.

{1-7} is wrong, {1,7} is used to match the thing right before it between 1 and 7 times.

Also, you can't use . directly because they have special meaning, so you need to escape them like this \..

In other words [Street|St.|St|Road|Rd|Rd.] matches the individual characters and not the whole words, it even matches | literally.

Ibrahim Najjar
  • 19,178
  • 4
  • 69
  • 95
  • First with this one failed 7 out of 10 different pages. I bet it's close though - I'll play with it - thanks! –  Nov 01 '13 at 21:45
0

If you are looking for any address that includes any string from your list, you have to define it as part of the "matching patterns".

You can use the preg_match() function that returns 1 if a match is found and 0 otherwise.

A sample list of matching patterns can be:

/Street|St.*|Way|Avenue/ and similar.

nbaroz
  • 126
  • 1
  • 5
0

This is not a regex, but is a solution for possibly parsing street addresses?

Parse A Steet Address into components

Even if this doesnt solve your problem, a regex is the wrong solution for this problem, you need a parser or a grammar. Something more sophisticated than a regex.

You will drive yourself crazy trying to solve this with a regex.

Community
  • 1
  • 1
Toby Allen
  • 10,997
  • 11
  • 73
  • 124
  • I believe I have already accomplished crazy this week - learned my first newbie lesson. Bought RegexBuddy and Magic, but the funny thing is you need a doctorate to use the tools. All other regex I've done has been easy, but this one has been a real stumbler. Thanks. –  Nov 01 '13 at 21:41