I'm parsing information from a very long HTML table; right now the code I'm using parses using the DOMDocument, DOMElement (etc) classes. I wanted to do a performance test running the current method against Regex'ing the information out of the table but I can't get the right expression.
An HTML row of the table looks like this:
<tr><td> JON SMITH </td><td> 2000-09-29 </td></tr>
And the expression I've been attempting looks something like this:
/(?:<td>([a-zA-Z\s]*?)<\/td><td>([0-9-\s]*?)<\/td>)/
The issue with the above expression is that it's returning the entire row contents and not just the inner column contents. Ideally the preg_match_all array results would be name, date, name, date etc.
Is this a reasonable thing to do, or should I stick with the DOM technique? If it is reasonable, could someone lend a hand with the regex?
Thanks!
EDIT: In case anyone stumbles upon this in the future, the RegEx solution has WAY better performance than using the DOM classes; in my situation it's the difference between seconds and minutes.