0

Is there a way to get the text between 2 nodes with XPath 1 ?

Example: We want to get text between F and D and the expected result would be "G"

    $html = ''.
        '<html>'.
        '<body>'.
        '<a>A</a>'.
        '<b>B
            <c>C
                <F>F</F>
            </c>
            <G>G</G>
        </b>'.
        '<d>D
            <e>E</e>
        </d>'.
        '</body>'.
        '</html>';

Here is the query:

$dom = new \DOMDocument();
@$dom->loadHTML($html);
$xpath = new \DOMXPath($dom);
$a = '/html/body/b/c/f';
$b = '/html/body/d';
$nodesBetween = getNodesBetween($a,$b, $xpath);

Finally the function:

public function getNodesBetween($a, $b, $domxpath) {
        $query = $a."/following::text()[. = ".$b."/preceding::text()]";
        $elements = $domxpath->query($query);
        $inside = '';
        foreach ($elements as $element) {
            $inside .= $element->nodeValue;
        }
        dd($inside);
}

If I try to search from A to D, it's working and the output is "B C F G". If I search between F and D, it's returning an empty string. Seems it's searching for siblings and as F has none, it stops. The only answer I could find was with XPath 2.0:

"assuming you want nodes at all tree depths between the two h3 elements, which would not necessarily be siblings"

from https://stackoverflow.com/a/3838151/3628541

/path/to/first/h3/following::node()[. << /path/to/second/h3]

What is the equivalent in 1.0 ?

Patrick L.
  • 526
  • 6
  • 24

1 Answers1

1

You're looking for the intersection of $A/following::node() with $B/preceding::node().

In XPath 1.0 the intersection of $X and $Y is given by $X[count(.|$Y)=count($Y)].

So that gives you

$A/following::node()[count(.|$B/preceding::node())=count($B/preceding::node())]

which is likely to have monstrously bad performance.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • This is the query as you suggested: `$query = $a."/following::node()[count(.|".$b."/preceding::node())=count(".$b."/preceding::node())]";` but it doesn't work either. Returning an empty string. – Patrick L. Mar 07 '18 at 01:42
  • my mistake, i made a wrong variable assignment. It should have been: `$a = '/html/body/b/c/f';` to make sense. Thanks! – Patrick L. Mar 07 '18 at 02:16