2

I have an XPath like this. I am trying to do exact match a string in the HTML.

$d->loadHTML('<?xml encoding="UTF-8">'.$html);
$xpath = new DOMXPath($d);
$txt="rest";
$txt=' '.$txt.' ';
$txt1=strtolower($txt);
$xpath->query('/html/body//text()['.
                           'contains('.
                              'translate(.,"'.strtoupper($txt).'","'.$txt1.'"),'.
                              '"'.$txt1.'")'.
                         ']'.'[not(ancestor::a)][not(ancestor::h2)][not(ancestor::h3)]');

It is working if the string is like this. "Please rest in ...." But if &nbsp; is present before 'rest' in the string rather than whitespace then it is not working. "Please&nbsp;rest in ...."

  • ` ` and ` ` are different characters. Code point 160 versus 32. – Knut Forkalsrud Nov 21 '21 at 19:09
  • Is it possible to treat them the same somehow? – saurabh yadav Nov 21 '21 at 19:16
  • Disclaimer: I'm no XPath expert. If memory serves me, XPath doesn't have a lot of string matching/manipulation capability. You may have to look beyond. Maybe use XPath to find *every* bit of text in the document, and regex to look for the occurrences of "rest" within those. – Knut Forkalsrud Nov 21 '21 at 19:27
  • Actually, see https://stackoverflow.com/questions/393840/locating-the-node-by-value-containing-whitespaces-using-xpath for a better answer. – Knut Forkalsrud Dec 08 '21 at 00:21

0 Answers0