PHP Search Text Highlight Function

Question

I have a PHP highlighting function which makes certain words bold.

Below is the function, and it works great, except when the array: $words contains a single value that is: b

For example someone searches for: jessie j price tag feat b o b

This will have the following entries in the array $words: jessie,j,price,tag,feat,b,o,b

When a 'b' shows up, my whole function goes wrong, and it displays a whole bunch of wrong html tags. Of course I can strip out any 'b' values from the array, but this isn't ideal, as the highlighting isnt working as it should with certain queries.

This sample script:

    function highlightWords2($text, $words)
    {
        $text =  ($text);
        foreach ($words as $word)
        {       
            $word = preg_quote($word);

            $text = preg_replace("/\b($word)\b/i", '<b>$1</b>', $text);

        }
        return $text;
    }


$string = 'jessie j price tag feat b o b';

$words = array('jessie','tag','b','o','b');

echo highlightWords2($string, $words);

Will output:

<<<b>b</b>><b>b</b></<b>b</b>>>jessie</<<b>b</b>><b>b</b></<b>b</b>>> j price <<<b>b</b>><b>b</b></<b>b</b>>>tag</<<b>b</b>><b>b</b></<b>b</b>>> feat <<b>b</b>><b>b</b></<b>b</b>> <<b>b</b>>o</<b>b</b>> <<b>b</b>><b>b</b></<b>b</b>>

And this only happens because there are "b"'s in the array.

Can you guys see anything that I could change to make it work properly?

I found it online somewhere, but i've actually just solved my problem. If i change the and to and , then it works perfectly. The \b's in the preg_replace must have been playing up with the and tags. — Mr.Boon, Dec 19 '11 at 16:58
Is there a need to highlight words like carport in `carport` or `carport` as well? — hakre, Dec 19 '11 at 18:38

score 5 · Accepted Answer · edited Aug 02 '13 at 14:28

5

You problem is that when your function goes through and looks for all the b's to bold it sees the bold tags and also tries to bold them as well.

@symcbean was close but forgot one thing.

$string = 'jessie j price tag feat b o b';
$words = array('jessie','tag','b','o','b');

print hl($string, $words);

function hl($inp, $words)
{
  $replace=array_flip(array_flip($words)); // remove duplicates
  $pattern=array();
  foreach ($replace as $k=>$fword) {
     $pattern[]='/\b(' . $fword . ')(?!>)\b/i';
     $replace[$k]='<b>$1</b>';
  }
  return preg_replace($pattern, $replace, $inp);
}

Do you see this added "(?!>)" that is a negative look ahead assertion, basically it says only match if the string is not followed by a ">" which is what would be seen is opening bold and closing bold tags. Notice I only check for ">" after the string in order to exclude both the opening and closing bold tag as looking for it at the start of the string would not catch the closing bold tag. The above code works exactly as expected.

edited Aug 02 '13 at 14:28

Mena

47,782
11
87
106

answered Dec 19 '11 at 17:18

JoshStrange

1,121
1
7
22

Was about to post the same thing. I would also recommend using preg_replace with arrays instead, as that limits the amount of read throughs of the text to one time, eliminating complexity and increasing speed. – saccharine Dec 19 '11 at 17:32
What if the text has HTML attributes that contain a search term? Or HTML comments? Or javascript? – hakre Dec 19 '11 at 18:34
@hakre Yes that would cause problems if you had Blah it would make it Blah. I am not 100% sure on how to combat that other than cleaning the input of all html before running it though the highlighting function. – JoshStrange Dec 19 '11 at 19:08
1

thank you very much @JoshStrange Sir... this save me a time! :) – Mohammed Sufian Feb 25 '14 at 22:27
To not have what @JoshStrange says with text within a HTML **tag**, you can properly use `strip_tags()` only for matching with regex. I also had trouble having entities in the search word (that should be highlighted with a CSS class) and came up with something like: `$decoded = html_entity_decode($words, ENT_COMPAT, 'UTF-8');`. – Roland Nov 30 '17 at 10:42
Oh, `preg_replace()` is used. Then `strip_tags()` will eliminate them not only for regex testing. My bad. Maybe still acceptable? – Roland Nov 30 '17 at 10:43

score 2 · Answer 2 · edited May 23 '17 at 12:16

Your base problem is that you quite wildly replace plain text strings inside HTML. That does cause your problem for small strings as you replace text in tags and attributes as well.

Instead you need to apply your search and replace to the text between HTML texts only. Additionally you don't want to highlight inside another highlight as well.

To do such things, regular expressions are quite limited. Instead use a HTML parser, in PHP this is for example DOMDocument. With a HTML parser it is possible to search only inside the HTML text elements (and not other things like tags, attributes and comments).

You find a highlighter for text in a previous answer of mine with a detailed description how it works. The question is Ignore html tags in preg_replace and it is quite similar to your question so probably this snippet is helpful, it uses <span> instead of <b> tags:

$doc = new DOMDocument;
$doc->loadXML($str);
$xp = new DOMXPath($doc);

$anchor = $doc->getElementsByTagName('body')->item(0);
if (!$anchor)
{
    throw new Exception('Anchor element not found.');
}

// search elements that contain the search-text
$r = $xp->query('//*[contains(., "'.$search.'")]/*[FALSE = contains(., "'.$search.'")]/..', $anchor);
if (!$r)
{
    throw new Exception('XPath failed.');
}

// process search results
foreach($r as $i => $node)
{   
    $textNodes = $xp->query('.//child::text()', $node);

    // extract $search textnode ranges, create fitting nodes if necessary
    $range = new TextRange($textNodes);        
    $ranges = array();
    while(FALSE !== $start = strpos($range, $search))
    {
        $base = $range->split($start);
        $range = $base->split(strlen($search));
        $ranges[] = $base;
    };

    // wrap every each matching textnode
    foreach($ranges as $range)
    {
        foreach($range->getNodes() as $node)
        {
            $span = $doc->createElement('span');
            $span->setAttribute('class', 'search_hightlight');
            $node = $node->parentNode->replaceChild($span, $node);
            $span->appendChild($node);
        }
    }
}

If you adopt it for multiple search terms, I would add an additional class with a number depending on the search term so you can nicely style it with CSS in different colors.

Additionally you should remove duplicate search terms and make the xpath expression aware to not look for text that is already part of an element that has the highlight span assigned.

score 0 · Answer 3 · answered Dec 19 '11 at 17:06

If it were me I'd have used javascript.

But using PHP, since the problem only seems to be duplicate entries in the search, just remove them, also you can run preg_replace just once rather than multiple times....

$string = 'jessie j price tag feat b o b';
$words = array('jessie','tag','b','o','b');

print hl($string, $words);

function hl($inp, $words)
{
  $replace=array_flip(array_flip($words)); // remove duplicates
  $pattern=array();
  foreach ($replace as $k=>$fword) {
     $pattern[]='/\b(' . $fword . ')\b/i';
     $replace[$k]='<b>$1<b>';
  }
  return preg_replace($pattern, $replace, $inp);
}

PHP Search Text Highlight Function

3 Answers3

Linked

Related