-4

There is the following line:

$str = '<div class="hello"> Hello world &lt hello world?! </div>';

Need to find all the matches within the tag, while avoiding a match attribute values. Try something like:

$pattern = '/(.*)(hello)(.*)(?=<\/)/ui'; 
$replacement = '$1<span style="background:yellow">$2</span>$3';

but yet there is only one "hello". What to do?

HookeD74
  • 65
  • 1
  • 6

2 Answers2

2

(*SKIP)(*F) Syntax in Perl and PCRE (PHP, Delphi, R...)

With all the disclaimers about using regex to parse html, we can do this with a surprisingly simple regex:

<[^>]*>(*SKIP)(*F)|(hello)

Sample PHP Code:

$replaced = preg_replace('~<[^>]*>(*SKIP)(*F)|(hello)~i',
                        '<span style="background:yellow">$1</span>',
                         $yourstring);

In the regex demo, see the substitutions at the bottom.

Explanation

This problem is a classic case of the technique explained in this question to "regex-match a pattern, excluding..."

The left side of the alternation | matches complete <tags> then deliberately fails, after which the engine skips to the next position in the string. The right side captures hello (case-insensitive to Group 1, and we know they are the right ones because they were not matched by the expression on the left.

Reference

Community
  • 1
  • 1
zx81
  • 41,100
  • 9
  • 89
  • 105
1

Wrapping text matches into another element is a pretty basic operation, though the code is somewhat tricky:

$html = <<<EOS
<div class="hello"> Hello world &lt; hello world?! </div>
EOS;

$dom = new DOMDocument;
$dom->loadHTML($html);

$search = 'hello';

foreach ($dom->getElementsByTagName('div') as $element) {
    foreach ($element->childNodes as $node) { // iterate all direct descendants
        if ($node->nodeType == 3) { // and look for text nodes in particular
            if (($pos = strpos($node->nodeValue, $search)) !== false) {
                // we split the text up in: <prefix> match <postfix>
                $postfix = substr($node->nodeValue, $pos + strlen($search));
                $node->nodeValue = substr($node->nodeValue, 0, $pos);

                // insert <postfix> behind the current text node
                $textNode = $dom->createTextNode($postfix);
                if ($node->nextSibling) {
                    $node->parentNode->insertBefore($textNode, $node->nextSibling);
                } else {
                    $node->parentNode->appendChild($textNode);
                }

                // wrap match in an element and insert it    
                $wrapNode = $dom->createElement('span', $search);
                $element = $node->parentNode->insertBefore($wrapNode, $textNode);
            }
        }
    }
}

echo $dom->saveHTML(), "\n";
Ja͢ck
  • 170,779
  • 38
  • 263
  • 309