2

I have the following code

        <?php
        $doc = new DOMDocument;
        $doc->loadhtml('<html>
                       <head> 
                        <title>bar , this is an example</title> 
                       </head> 
                       <body> 
                       <h1>latest news</h1>
                       foo <strong>bar</strong> 
                      <i>foobar</i>
                       </body>
                       </html>');


        $xpath = new DOMXPath($doc);
        foreach($xpath->query('//*[contains(child::text(),"bar")]') as $e) {
              echo $e->tagName, "\n";
        }

Prints

       title
       strong
       i

this code finds any HTML element that contains the word "bar" and it matches words that has "bar" like "foobar" I want to change the query to match only the word "bar" without any prefix or postfix

I think it can be solved by changing the query to search for every "bar" that has not got a letter after or before or has a space after or before

this code from a past question here by VolkerK

Thanks

Community
  • 1
  • 1
ahmed
  • 14,316
  • 30
  • 94
  • 127
  • Reference: [Using regex to filter attributes in xpath with php](http://stackoverflow.com/q/6823032/367456) (Jul 2011), – hakre Aug 17 '15 at 05:32

2 Answers2

2

You can use the following XPath Query

$xpath->query("//*[text()='bar']");

or

$xpath->query("//*[.='bar']");

Note using the "//" will slow things down, the bigger you XML file is.

null
  • 7,432
  • 4
  • 26
  • 28
  • Thanks but this does not work, it prints: "strong" whilst it should prints "strong" and "title" because the word "bar" is in the title as well – ahmed Aug 12 '09 at 02:05
  • I thought you just wanted to match just "bar" now I see you want it to match "bar" or "this bar now" but *not* "this foobar now". – null Aug 12 '09 at 17:23
2

If you are looking for just "bar" with XPath 1.0 then you'll have to use a combo of functions, there are no regular expressions in XPath 1.0.

$xpath->query("//*[
                starts-with(., 'bar') or 
                contains(., ' bar ') or  
                ('bar' = substring(.,string-length(.)-string-length('bar')+1))
              ]");

Basically this is saying locate strings that start-with 'bar' or contains ' bar ' (notice the spaces before and after) or ends-with 'bar' (notice that ends-with is an XPath 2.0 function, so I substituted code which emulates that function from a previous Stackoverflow Answer.)

if the contains ' bar ' is not enough, because you may have "one bar, over" or "This bar. That bar." where you may have other punctuation after the 'bar'. You could try this contains instead:

contains(translate(., '.,[]', ' '), ' bar ') or

That translates any '.,[]' to a ' ' (single space)... so "one bar, over" becomes "one bar over", thus would match " bar " as expected.

Community
  • 1
  • 1
null
  • 7,432
  • 4
  • 26
  • 28