1

I want to use xpath query to retrieve the "Testing" only once, in the following test.html

<html>
    <body>
        <div class="test1"></div>
        <div class="test2">
            <div><strong>Testing</strong></div>
        </div>
    </body>
</html>

Here is the php code I used to retrieve the content.

$uri='test.html';
$doc = new DOMDocument('1.0','utf-8');
$doc->loadHTMLFile($uri);
$xpath= new DOMXPath($doc);
$path="/html/body/div[2]//*"; 
$elements = $xpath->query($path);

if(!is_null($elements)){
    foreach($elements as $element){
        echo '<br>['.$element->nodeName.']';
        $nodes = $element->childNodes;
        foreach($nodes as $node){
            $nodeValue=$node->nodeValue;
            echo $nodeValue;
        }
    }
}

Here is the result I got.

[div] Testing 
[strong] Testing

Why does it print "Testing" even in the [div] node? I want it only retrieve "Testing" when it is in [strong] node.

James Zhao
  • 671
  • 1
  • 8
  • 17

2 Answers2

0

That's just how it works - nodeValue of parent node will always contain nodeValues of its children.

nodeValue doesn't quite fit your goal. You should get text nodes among its children instead. See this question: Getting node's text in PHP DOM.

Community
  • 1
  • 1
Inglis Baderson
  • 779
  • 4
  • 12
0

Your XPath /html/body/div[2]//* returns all descendants of div[2] including child and grandchild nodes.

To get only grandchildren use /html/body/div[2]/*/*

Bill Velasquez
  • 875
  • 4
  • 9