1

The functions below, will do a replace on the content (which is html markup) wrapping bold and em tags around the first two occurrences of the keyword that it finds.

The one case I need to account for though, is if the keyword is already inside of an h1 tag I don't want the callback to occur.

Example:

<h1>this is the keyword inside of a heading tag</h1>

After replacement

<h1>this is the <b>keyword</b> inside of a heading tag</h1>

How might I alter the replacement so that it skips over keywords that appear inside a heading tag (h1-h6) and moves on to the next match?

function doReplace($matches)
{
    static $count = 0;
    switch($count++) {
        case 0: return ' <b>'.trim($matches[1]).'</b>';
        case 1: return ' <em>'.trim($matches[1]).'</em>';
        default: return $matches[1];
            }
    }

function save_content($content){
    $mykeyword = "test";
    if ((strpos($content,"<b>".$mykeyword) > -1 || 
    strpos($content,"<strong>".$mykeyword) > -1) && 
    strpos($content,"<em>".$mykeyword) > -1 ) 
    {
        return $content;
    }
    else
    {
        $theContent = preg_replace_callback("/\b(?<!>)($mykeyword)\b/i","doReplace", $content);
        return $theContent;
    }
}
Scott B
  • 38,833
  • 65
  • 160
  • 266
  • 1
    This may be appropriate: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – EboMike Nov 05 '10 at 01:27
  • 3
    @EboMike It is appropriate for 99% of questions tagged [html] and [regex] :) – alex Nov 05 '10 at 01:29
  • Perhaps I should use xpath, however, I can't find an xpath example that also does recursive find/replace. – Scott B Nov 05 '10 at 01:35

1 Answers1

4

Don't use regexes for HTML/XML:

$d = new DOMDocument();
$d->loadHTML($your_html);
$x = new DOMXpath($d);
foreach($x->query("//text()[
   contains(.,'keyword')
   and not(ancestor::h1) 
   and not(ancestor::h2) 
   and not(ancestor::h3) 
   and not(ancestor::h4) 
   and not(ancestor::h5) 
   and not(ancestor::h6)]") as $node){
    //do with the node as you like
}       
Wrikken
  • 69,272
  • 8
  • 97
  • 136
  • trying to test this, but I can't get anything to echo inside {}. Are you able to? – Scott B Nov 05 '10 at 12:04
  • Thanks! What can I echo after the // to see the output? try echo $node returns "Object of class DOMText could not be converted to string" – Scott B Nov 05 '10 at 14:38
  • Either `$node->textContent` or `$node->ownerDocument->saveXML($node);` for more complex (non-DOMText) nodes. – Wrikken Nov 05 '10 at 15:03