30

Is there a way to do an xpath query on a DOMNode? Or at least convert it to a DOMXPath?

<html>
  ...
  <div id="content">
     ...
     <div class="listing">
         ...
         <div></div>
         <div></div>
         <div  class='foo'>
           <h3>Get me 1</h3>
           <a>and me too 1</a>
         </div>
     </div>
     <div class="listing">
         ...
         <div></div>
         <div></div>
         <div class='foo'>
           <h3>Get me 2</h3>
           <a>and me too 1</a>
         </div>
     </div>
     ....
  </div>
</html>

This is my code. I am trying to get a list of array that has the values of the h3 and a tags in each array. To do that, I needed to get each listing, and then get the h3 and a tag's value in each listing.

$html_dom = new DOMDocument();
@$html_dom->loadHTML($html);
$x_path = new DOMXPath($html_dom);

$nodes= $x_path->query("//div[@id='content']//div[@class='listing']");

foreach ($nodes as $node)
{
  // I want to further dig down here using query on a DOMNode
}
Pang
  • 9,564
  • 146
  • 81
  • 122
developarvin
  • 4,940
  • 12
  • 54
  • 100
  • while it's possible to query from a particular node, you could simply query for all the divs with the class foo. Or the last div child of listing and get the values immediately. – Gordon May 24 '13 at 05:02
  • I was thinking of getting the listings first and then query the values inside it so that I can put it easily in an array in that structure. But I guess I could just match the indexes of the results of h3 and a if I want to. – developarvin May 24 '13 at 05:13

3 Answers3

46

Pass the node as the second argument to DOMXPath::query

contextnode: The optional contextnode can be specified for doing relative XPath queries. By default, the queries are relative to the root element.

Example:

foreach ($nodes as $node) {
    foreach ($x_path->query('h3|a', $node) as $child) {
        echo $child->nodeValue, PHP_EOL;
    }
}

This uses the UNION operator for a result of

Get me 1
and me too 1
Get me 2
and me too 1

If you don't need any complex querying, you can also do

foreach ($nodes as $node) {
    foreach ($node->getElementsByTagName('a') as $a) {
      echo $a->nodeValue, PHP_EOL;
    }
}

Or even by iterating the child nodes (note that this includes all the text nodes)

foreach ($nodes as $node) {
    foreach ($node->childNodes as $child) {
      echo $child->nodeName, PHP_EOL;
    }
}

However, all of that is unneeded since you can fetch these nodes directly:

$nodes= $x_path->query("/html/body//div[@class='listing']/div[last()]");

foreach ($nodes as $i => $node) {
    echo $i, $node->nodeValue, PHP_EOL;
}

will give you two nodes in the last div child of all the divs with a class attribute value of listing and output the combined text node values, including whitespace

0
           Get me 1
           and me too 1

1
           Get me 2
           and me too 1

Likewise, the following

"//div[@class='listing']/div[last()]/node()[name() = 'h3' or name() = 'a']"

will give you the four child H3 and A nodes and output

0Get me 1
1and me too 1
2Get me 2
3and me too 1

If you need to differentiate these by name while iterating over them, you can do

foreach ($nodes as $i => $node) {
    echo $i, $node->nodeName, $node->nodeValue, PHP_EOL;
}

which will then give

0h3Get me 1
1aand me too 1
2h3Get me 2
3aand me too 1
Gordon
  • 312,688
  • 75
  • 539
  • 559
  • what if he wanted to store h3 and a in different variables something like this http://stackoverflow.com/questions/43131400/xpath-query-get-child-nodes-in-a-parent-node-using-a-loop – DragonFire Mar 31 '17 at 05:43
  • The query must be relative, as described in https://bugs.php.net/bug.php?id=34413, and written in the next answer – eleuteron May 09 '19 at 05:45
18

Provide your $node as a context node.

foreach ($nodes as $node)
{
   $morenodes = $x_path->query(".//h3", $node);
}

See $contextnode in the manual: http://php.net/manual/en/domxpath.query.php

EPB
  • 3,939
  • 1
  • 24
  • 26
  • When I did that solution, I seem to get the same results when loooping over the nodelist...I keep getting the values for the first listing – developarvin May 24 '13 at 05:09
  • What's the xpath query you are using? Edit: Inside the loop that is. – EPB May 24 '13 at 05:12
  • 10
    I went ahead and edited with an example query I had used initially to test my answer. Starting with the `.` is important if you plan to use `//` to start the query, which apparently is always relative to root. – EPB May 24 '13 at 05:29
5

Just to make it complete, there is a DOMNode::getNodePath method which returns xpath of that node. So you can also use $x_path->query($node->getNodePath().'//h3')

Fei
  • 296
  • 2
  • 7