Breaking a given HTML code using Reg Exp

Question

Can anyone help me with a quick regex problem?

I have the following HTML:

555 Some Street Name<BR />
New Providence VA 22901-1311<BR />
United States<BR />

The first row is always the Street

Second row is City (which can have spaces) space State Abbv. space Zip hyphen 4 digit zip

Third row is the Country.

I need to break the HTML into a variable each. Can anyone provide a quick regex?

Edit: Maybe I wasn't clear. I need the following: Street address, City, State, Zip, 4Digit Zip, Country as individual variables.

is it in different lines? Then you can just strip the tags and take it. — footy, Jan 17 '12 at 16:30
What is "Zip" and "4Digit Zip" in this case? An example would be nice. (I come from germany, and I am unaccustomed to this format). — Armin, Jan 17 '12 at 16:35
To construct a regex see also [open source regexbuddy](http://stackoverflow.com/questions/89718/is-there) and [online regex testing](http://stackoverflow.com/questions/32282/regex-testing) for some helpful tools, or [regexp.info](http://regular-expressions.info/) for a nicer tutorial. — mario, Jan 17 '12 at 16:37
the $zip in this case would be 22901 and 4 digit zip would be $fourZip = 1311 — Rohit Chopra, Jan 17 '12 at 16:39

score 2 · Answer 1 · answered Jan 17 '12 at 16:35

2

This doesn't even require regular expressions. You can split the diefferent lines using explode("<BR />",...). First line is Street, Last line is country. The middle line can be split using substr(), as you know that the last 4 characters are the 4 digit ZIP, the 6 characters before them are the ZIP followed by a hyphen and the 3 characters before them are the state followed by a space. So the numbers of characters of the segments (counted from the end of the line) is constant.

answered Jan 17 '12 at 16:35

Simon

3,509
18
21

I like this. If I have to run the same thing on 100,000 rows of data, would substr still be better then preg_match? – Rohit Chopra Jan 17 '12 at 16:40
Definitly. I don't have numbers right now, but substr should run much faster than preg_match – Simon Jan 17 '12 at 18:20

Charles Forest · Accepted Answer · 2012-01-17T16:45:01.153

555 Some Street Name<BR />
New Providence VA 22901-1311<BR />
United States<BR />

ok, for the first part, let's split the lines

$array = explode('<BR />', $address);

now you need to get the informations from the second line to be parsed as well...

$array[1] = New Providence VA 22901-1311;

$tmp = explode(' ', $array[1]);

and all you need now is to set everything in the correct variable names

$fullZip = array_pop($tmp);
$zipArray = explode('-',$fullZip);
$zip = $zipArray[0];
$Digitzip = $zipArray[1];
$state = array_pop($tmp);
$providence = implode($tmp);
$country = $array[2];
$street = $array[0];

score 0 · Answer 3 · edited May 23 '17 at 11:48

0

No need for a regex.

$htmlStr = '555 Some Street Name<BR />New Providence VA 22901-1311<BR />United States<BR />';

Live example

Note, however, that for more complicated HTML parsing, regexes are not the tool for the job.

edited May 23 '17 at 11:48

Community

1
1

answered Jan 17 '12 at 16:31

Alex Turpin

46,743
23
113
145

1

Wrong programming language! ;-) – Armin Jan 17 '12 at 16:32
@Armin oops, I feel stupid now, I'm used to answering JS questions haha. Edited answer. – Alex Turpin Jan 17 '12 at 16:52

score 0 · Answer 4 · answered Jan 17 '12 at 16:31

0

$array = explode('<BR />', $address);

This is the easiest way, just split the string by the <br />-tags. If you can avoid regular expressions, you should do, cause they are not as performant than simple string operations like an explode.

answered Jan 17 '12 at 16:31

Armin

15,582
10
47
64

Maybe I wasn't clear. I need the following: Street address, City, State, Zip, 4Digit Zip, Country as individual variables. – Rohit Chopra Jan 17 '12 at 16:33
You just need to work with the $array[1], cause [0] and [2] contains already, what you want. You could use `strpos` in combination with `substr`, and split this part by the "VA" string. – Armin Jan 17 '12 at 16:37

Breaking a given HTML code using Reg Exp

4 Answers4