0

How would I modify the script below so that if the first appearance of the keyword in the content string is already enclosed in bold or strong, I escape the node replacement?

    $keyword = "test";

    $content = "this is a <strong>test</strong> phrase with the word "test" in it.
                in this example, nothing would be changed, since the first 
                appearance of the keyword is already in boldface";

    @$d = new DOMDocument();
    @$d->loadHTML($content);
    @$x = new DOMXpath($d);
    @$nodes = $x->query("//text()[contains(.,'$keyword') and not(ancestor::h1) and not(ancestor::h2) and not(ancestor::h3) and not(ancestor::h4) and not(ancestor::h5) and not(ancestor::h6) and not(ancestor::b) and not(ancestor::strong)]");
    if ($nodes && $nodes->length) {
        $node = $nodes->item(0);
        // Split just before the keyword
        $keynode = $node->splitText(strpos($node->textContent, $keyword));
        // Split after the keyword
        $node->nextSibling->splitText(strlen($keyword));
        // Replace keyword with <b>keyword</b>
        $replacement = $d->createElement('strong', $keynode->textContent);
        $keynode->parentNode->replaceChild($replacement, $keynode);
    }
    echo $d->saveHTML();
Scott B
  • 38,833
  • 65
  • 160
  • 266
  • 1
    If I understand correctly, you want to skip the node replacement if the keyword is already enclosed in a `` or ``? Well if so, you're already doing that by using the following XPath expression: `and not(ancestor::b) and not(ancestor::strong)`. – netcoder Feb 02 '11 at 19:04
  • The way it currently works is that on the first time the doc is saved, the script encloses the first keyword in bold. Then when the document is saved a second time, the 2nd appearance of the keyword is placed in boldface. In that case, I want it to exit before it has a chance to bold the 2nd appearance of the keyword since the first appearance was already bolded. – Scott B Feb 02 '11 at 19:18
  • Why are you trying to save the document multiple times? – salathe Feb 02 '11 at 20:29
  • @salathe - I probably should have clarified that. Its a wordpress post. The document can be edited and saved multiple times. – Scott B Feb 02 '11 at 22:59
  • @Scott B, Why not leave the post untouched and just do the highlighting when it is written to the page? – salathe Feb 03 '11 at 07:44
  • @salathe - It'd be super easy to do that. I guess I'm just trying to be different :) That, plus the idea that if I do it once and save it to the database, it never has to be done again, or every time the page loads I'm adding a tiny bit of processor activity over time, hundreds, thousands, etc... for something that could be done once and done at edit time. – Scott B Feb 03 '11 at 14:46

1 Answers1

1

In that specific case, use evaluate instead of query and change the XPath to count the elements that match the highlight criteria with

"count(//text()[contains(.,'$keyword') and (ancestor::b or ancestor::strong)])"

If that returns > 1 the keyword is already enclosed. You have to run this query before the other query.

Gordon
  • 312,688
  • 75
  • 539
  • 559
  • Works perfectly Gordon. Although I'm still struggling to make the search case insensitive. If keyword is "Foo Bar" and I have foo bar, it still returns 0 – Scott B Feb 02 '11 at 23:15
  • @Scott you can either try with that translate function everyone suggested or use [`registerPHPFunctions`](http://de3.php.net/manual/en/domxpath.registerphpfunctions.php) and then use PHP's `stripos` in the query. Example: [case insensitive xpath searching in php](http://stackoverflow.com/questions/3238989/case-insensitive-xpath-searching-in-php/3240155#3240155) – Gordon Feb 02 '11 at 23:21
  • 1
    I found the problem. I was added an empty space before the keyword which was throwing off the whole thing. Got it now, thanks to your help! – Scott B Feb 02 '11 at 23:27