I want to extract an authors name out of an html tag. The tag looks like this:
<a href="http://somewhere.com"> Manfred </a>
but if the name is to long, it looks like this:
<a title="floormanager004" href="http://somewhere.com"> floormanage... </a>
I have the following regex to cover both cases:
~<a.*(title="(.{2,50})".*|>(.*))</a>~Usi
This works fine in the second case, returning a two dimensional array like this:
array(2) {
[0]=>
string "title="floormanager004" href="http://somewhere.com"> floormanage... "
[1]=>
string "floormanager004"
}
But for the first case, the array contains an additional empty field:
array(2) {
[0]=>
string "> Manfred "
[1]=>
string ""
[2]=>
string " Manfred "
}
Why does this field appear and how to get rid of it?
Disclaimer: I know when using regex to parse html you gonna have a baaaaad time and you should never ever ever do this, but in my case it's proven to be faster than XPATH and the like. Please don't comment on this.