I originally wasn't going to answer this but reading Tom Lords link to the mystical Regex parsing of XML made me reconsider.
Regex CAN be used to parse all examples shown because the XHTML is "fluff" and is entirely unimportant for the finding of the number(s). Yes, some instances of XHTML will potentially contain 6 numeric characters in a row, but that's unlikely at best, and for the perceived scale of this application (ie not complex or massive, judging by the snippets given), it's doubtful that will be an issue.
The resultant output is not at all [X]HTML dependant in any form.
Quote:
Snippet 1:
<table><tr><td>432987</td></tr></table>
Snippet 2:
<div>164PE 09983
PO#432987</div>
Snippet 3:
Order 432987IRC
Snippet 4:
432987
To solve all of these and to return your missing number, 432987 you can simply do this:
$string = //whatever from above
preg_match_all("/[0-9]{6}/", $string, $match);
This will match any string of 6 digits without break.
Full Proof:
$string1 = "<table><tr><td>432987</td></tr></table>";
$string2 = "<div>164PE
09983 PO#432987</div>";
$string3 = "Order 432987IRC";
$string4 = "432987";
$string5 = "<html><head><title>Some numbers</title></head>
<body><h2>Oh my word, this is HTML being attacked by Regex!!!</h2>
<p>This must be Doooom! 123456</p>
</body>
</html>";
preg_match_all("/[0-9]{6}/", $string5, $match);
print_r($match);
Alternatively you can use regex number identifier \d
and so:
preg_match_all("/\d{6}/", $string5, $match);
Does exactly the same thing.
I have made an assumption you want a 6 digit number, but I suspect if you know what the number is and that the number will be static then it's easier to use PHP string find and replace functions such as str_replace
, etc.
Edit: Some Further reading.