0

Can anyone help me with a quick regex problem?

I have the following HTML:

555 Some Street Name<BR />
New Providence VA 22901-1311<BR />
United States<BR />

The first row is always the Street

Second row is City (which can have spaces) space State Abbv. space Zip hyphen 4 digit zip

Third row is the Country.

I need to break the HTML into a variable each. Can anyone provide a quick regex?

Edit: Maybe I wasn't clear. I need the following: Street address, City, State, Zip, 4Digit Zip, Country as individual variables.

Rohit Chopra
  • 2,791
  • 4
  • 28
  • 33
  • 2
    is it in different lines? Then you can just strip the tags and take it. – footy Jan 17 '12 at 16:30
  • What is "Zip" and "4Digit Zip" in this case? An example would be nice. (I come from germany, and I am unaccustomed to this format). – Armin Jan 17 '12 at 16:35
  • To construct a regex see also [open source regexbuddy](http://stackoverflow.com/questions/89718/is-there) and [online regex testing](http://stackoverflow.com/questions/32282/regex-testing) for some helpful tools, or [regexp.info](http://regular-expressions.info/) for a nicer tutorial. – mario Jan 17 '12 at 16:37
  • the $zip in this case would be 22901 and 4 digit zip would be $fourZip = 1311 – Rohit Chopra Jan 17 '12 at 16:39
  • http://stackoverflow.com/a/1732454/665923 – Damien Jan 17 '12 at 16:44

4 Answers4

2

This doesn't even require regular expressions. You can split the diefferent lines using explode("<BR />",...). First line is Street, Last line is country. The middle line can be split using substr(), as you know that the last 4 characters are the 4 digit ZIP, the 6 characters before them are the ZIP followed by a hyphen and the 3 characters before them are the state followed by a space. So the numbers of characters of the segments (counted from the end of the line) is constant.

Simon
  • 3,509
  • 18
  • 21
  • I like this. If I have to run the same thing on 100,000 rows of data, would substr still be better then preg_match? – Rohit Chopra Jan 17 '12 at 16:40
  • Definitly. I don't have numbers right now, but substr should run much faster than preg_match – Simon Jan 17 '12 at 18:20
1
555 Some Street Name<BR />
New Providence VA 22901-1311<BR />
United States<BR />

ok, for the first part, let's split the lines

$array = explode('<BR />', $address);

now you need to get the informations from the second line to be parsed as well...

$array[1] = New Providence VA 22901-1311;

$tmp = explode(' ', $array[1]);

and all you need now is to set everything in the correct variable names

$fullZip = array_pop($tmp);
$zipArray = explode('-',$fullZip);
$zip = $zipArray[0];
$Digitzip = $zipArray[1];
$state = array_pop($tmp);
$providence = implode($tmp);
$country = $array[2];
$street = $array[0];
Charles Forest
  • 1,035
  • 1
  • 12
  • 30
0

No need for a regex.

$htmlStr = '555 Some Street Name<BR />New Providence VA 22901-1311<BR />United States<BR />';

Live example

Note, however, that for more complicated HTML parsing, regexes are not the tool for the job.

Community
  • 1
  • 1
Alex Turpin
  • 46,743
  • 23
  • 113
  • 145
0
$array = explode('<BR />', $address);

This is the easiest way, just split the string by the <br />-tags. If you can avoid regular expressions, you should do, cause they are not as performant than simple string operations like an explode.

Armin
  • 15,582
  • 10
  • 47
  • 64
  • Maybe I wasn't clear. I need the following: Street address, City, State, Zip, 4Digit Zip, Country as individual variables. – Rohit Chopra Jan 17 '12 at 16:33
  • You just need to work with the $array[1], cause [0] and [2] contains already, what you want. You could use `strpos` in combination with `substr`, and split this part by the "VA" string. – Armin Jan 17 '12 at 16:37