4

This question is related with one I have made before but because the topic is now closed and I need to ask something further I will start a new question by hoping that's fine.

In my previous answer I simplified the problem enough and resulted in simple but not fully working solutions. I realized it these days when I was implementing my code.

The problem with the solutions in the previous post is that the HTML tags are broken by the replacing functions. I have read in many posts of this site that I need to use a DOM Parser. I am very unfamiliar with this and I tried the code suggested by the user “ircmaxell” in this post, but it does not work for me.

Here is sample of what I did:

echo '<style type="text/css">
       .ht{
         background-color: yellow;
       }
     </style>'; 


/* taken from user ircmaxell at https://stackoverflow.com/questions/4081372/highlight-keywords-in-a-paragraph

I just modified line $highlight->setAttribute('class', 'highlight') to $highlight->setAttribute('class', 'ht') and commented the first 2 lines   */

function highlight_paragraph($string, $keyword) {
  //$string = '<p>foo<b>bar</b></p>';
  //$keyword = 'foo';
  $dom = new DomDocument();
  $dom->loadHtml($string);
  $xpath = new DomXpath($dom);
  $elements = $xpath->query('//*[contains(.,"'.$keyword.'")]');
  foreach ($elements as $element) {
   foreach ($element->childNodes as $child) {
     if (!$child instanceof DomText) continue;
     $fragment = $dom->createDocumentFragment();
     $text = $child->textContent;
     $stubs = array();
     while (($pos = stripos($text, $keyword)) !== false) {
       $fragment->appendChild(new DomText(substr($text, 0, $pos)));
       $word = substr($text, $pos, strlen($keyword));
       $highlight = $dom->createElement('span');
       $highlight->appendChild(new DomText($word));
       $highlight->setAttribute('class', 'ht');
       $fragment->appendChild($highlight);
       $text = substr($text, $pos + strlen($keyword));
     }
     if (!empty($text)) $fragment->appendChild(new DomText($text));
     $element->replaceChild($fragment, $child);
   }
 }
 $string = $dom->saveXml($dom->getElementsByTagName('body')->item(0)->firstChild);
 return $string;
}


$string = '<p>This book has been written against a background of both reckless optimism and reckless despair.</p>
<p>It holds that Progress and Doom are two sides of the same medal; that both are articles of superstition, not of faith. It was written out of the conviction that it should be possible to discover the hidden mechanics by which all traditional elements of our political and spiritual world were dissolved into a conglomeration where everything seems to have lost specific value, and has become unrecognizable for human comprehension, unusable for human purpose.</p>
<p> Hannah Arendt, The Origins of Totalitarianism (New York: Harcourt Brace Jovanovich, Inc., 1973 ed.), p.vii, Preface to the First Edition.</p>';

$keywords = array('This', 'book', 'has', 'been', 'written', 'background', 'reckless', 'optimism', 'despair.', 'holds', 'Progress', 'Doom ', 'two', 'sides', 'medal;', 'articles', 'superstition,', 'faith.', 'lost', 'Arendt,', 'Totalitarianism');

foreach ($keywords as $kw) {
  $string = highlight_paragraph($string, $kw);
}

echo $string;

echo $string only returns:

This book has been written against a background of both reckless optimism and reckless despair.

And only the first two words, 'This' and 'book' are highlighted.

Normally it should have outputted all the initial string with the keywords highlighted.

I have searched a lot in stackoverflow and google and did not find an easy to use code to achieve my purpose even if there are lots of people that have asked the same thing before.

I really need a help over here. Thanks in advance!

Community
  • 1
  • 1
GP_
  • 114
  • 2
  • 10

2 Answers2

7

You are lucky that I was very bored when I saw this question. ;)

The code you received as an answer didn't seem to have been tested - I don't know how it could have possibly worked correctly. Anyway, I fixed all the problems and present you a working version - tested on my locally installed Apache Server with PHP 5.3:

function highlight_paragraph($string, $keyword) {
  $dom = new DOMDocument();
  $dom->loadHtml($string);

  // Search for all text blocks containing the keyword
  $xpath = new DOMXpath($dom);
  $textNodes = $xpath->query('//*[contains(.,"'.$keyword.'")]/text()');

  foreach ($textNodes as $textNode) {
    $fragment = $dom->createDocumentFragment();
    $text = $textNode->nodeValue;
    $stubs = array();

    while (($pos = stripos($text, $keyword)) !== false) {
      $fragment->appendChild(new DOMText(substr($text, 0, $pos)));
      $word = substr($text, $pos, strlen($keyword));

      $highlight = $dom->createElement('span');
      $highlight->appendChild(new DOMText($word));
      $highlight->setAttribute('class', 'ht');
      $fragment->appendChild($highlight);

      $text = substr($text, $pos + strlen($keyword));
    }

    if (!empty($text))
      $fragment->appendChild(new DOMText($text));

    $textNode->parentNode->replaceChild($fragment, $textNode);
 }

 return $dom->saveHTML();
}
Denis Washington
  • 5,164
  • 1
  • 18
  • 21
  • This answer helped with [my question](http://stackoverflow.com/questions/15526781/regular-expression-negative-lookahead-lookbehind-to-exclude-html-from-find-and-r). Thanks! – TerranRich Mar 20 '13 at 17:29
  • 1
    Thank you so much for being bored! :-) – Alexandre R. Janini Jul 30 '14 at 13:30
  • Omg, finally. @denisw you're a legend. I am seeing this error though when I run it on the results: "Severity: Warning Message: DOMDocument::loadHTML(): htmlParseEntityRef: no name in Entity" Any ideas? – Solvision Nov 05 '16 at 23:26
  • Found the issue. Poorly formed HTML in source. Fixed by adding: libxml_use_internal_errors(true); above the loadHTML line – Solvision Nov 05 '16 at 23:31
  • One thing I did find was the highlighting isn't case insensitive. EG The keyword is "Andrew" so results with "andrew" are returned from db, but not highlighted – Solvision Nov 05 '16 at 23:34
0

above solution didn't work.. here's a really hacky but solid workaround to avoid highlighting and breaking html.

function highlight_fancy($string, $keywords=array()) {
    $dom = new DOMDocument();
    $dom->loadHtml($string);

    // Search for all text blocks containing the keyword
    $xpath = new DOMXpath($dom);
    foreach($keywords as $keyword){
        $textNodes = $xpath->query('//*[contains(.,"'.$keyword.'")]/text()');

        foreach ($textNodes as $textNode) {
            $fragment = $dom->createDocumentFragment();
            $text = $textNode->nodeValue;
            $stubs = array();

            while (($pos = stripos($text, $keyword)) !== false) {
                $fragment->appendChild(new DOMText(substr($text, 0, $pos)));
                $word = substr($text, $pos, strlen($keyword));

                $highlight = $dom->createElement('span');
                $highlight->appendChild(new DOMText($word));
                $highlight->setAttribute('class', 'hl');
                $fragment->appendChild($highlight);

                $text = substr($text, $pos + strlen($keyword));
            }

            if (!empty($text))
                $fragment->appendChild(new DOMText($text));

            $textNode->parentNode->replaceChild($fragment, $textNode);
        }
    }
    $html= $dom->saveHTML();
    $e=explode("<body><p>",$html);
    $e=explode("</p></body>",$e[1]);
    return $e[0];
}
jfaron
  • 139
  • 3
  • 7