PHP preg_match_all - group without returning a match

Question

How would I get content from HTML between h3 tags inside an element that has class pricebox? For example, the following string fragment

<!-- snip a lot of other html content -->
<div class="pricebox">
    <div class="misc_info">Some misc info</div>
    <h3>599.99</h3>
</div>
<!-- snip a lot of other html content -->

The catch is 599.99 has to be the first match returned, that is if the function call is

preg_match_all($regex,$string,$matches)

the 599.99 has to be in $matches[0][1] (because I use the same script to get numbers from dissimilar looking strings with different $regex - the script looks for the first match).

Seriously? Again? [Parsing HTML with regular expressions](http://stackoverflow.com/a/1732454/1023815)? — Adam Zalcman, Mar 23 '12 at 00:34
try this for dom manupulation http://simplehtmldom.sourceforge.net/ php has some awesome dom manupulation support as well. mostly good programmers do not recommend using regex for dom pars — Khurram Ijaz, Mar 23 '12 at 00:37
Well the answer you point to sounds a bit hysterical. HTML is just a string, it's not magical, and I need to match something between the first pair of h3 tags (again just strings) that come up after a substring 'class="pricebox"'. — DMIL, Mar 23 '12 at 00:43
Thanks Mian, that sounds useful but I need somethign that is independent of the actual PHP that's doing the parsing - I paste a regex into a CMS and the script uses that regex to get the data. — DMIL, Mar 23 '12 at 00:50

score 1 · Accepted Answer · answered Mar 23 '12 at 00:44

1

Try using XPath; definitely NOT RegEx.

Code :

$html = new DOMDocument();
@$html->loadHtmlFile('http://www.path.to/your_html_file_html');

$xpath = new DOMXPath( $html );
$nodes = $xpath->query("//div[@class='pricebox']/h3");

foreach ($nodes as $node)
{
    echo $node->nodeValue."";
}

answered Mar 23 '12 at 00:44

Dr.Kameleon

22,532
20
115
223

Thanks, I'll check it out. What I need is to be able to paste a matching pattern into a CMS and have the script handle it, without altering the script in any way for completely different strings. This looks promising. – DMIL Mar 23 '12 at 00:54
@DMIL For customiseable query strings regarding HTML parsing, `XPath` is definitely the way to go... (and it's REALLY easy to understand; and much easier to handle than `RegEx`...) – Dr.Kameleon Mar 23 '12 at 00:56
But what if there's content between '
' tags like '
only $599.99
'? How would I get that number with Xpath? I can't use Xpath and then regex because whatever pattern that gets the number needs to be entered in a text field in the CMS. I suppose I could have two fields, one for Xpath pattern, the other for regex to clean up whatever Xpath returns but... that's a pain in the ass too... – DMIL Mar 23 '12 at 01:21
@DMIL Well, what XPath does is simply to traverse a "branch" of the... HTML tree structure and fetch its value... e.g. `/html/body/div/p/div/h3`. Don't confuse it with RegEx. In your example, XPath would return `only $599.99`, and getting JUST the numeric value would be a whole different issue (that one, probably REQUIRING RegEx...). Seems like a pain in the ass? Probably. But, still it's simpler 'coz you'll be using the different coding techniques for what they were 'designed' for... ;-) – Dr.Kameleon Mar 23 '12 at 01:26

PHP preg_match_all - group without returning a match

1 Answers1

' tags like '

only $599.99