1

We're showing the results of a text search across several posts, and we want to have the search terms highlighted in the resulting page. Right now, we're doing this on the backend, going through the post text and title, wrapping any occurrences of the search term in a <strong class="highlighted"> tag. This is happening in PHP, like so:

foreach($terms as $t) {
  $t_text = preg_replace('/\W/u','',$t);
  $re = "@\b$t_text\b@ui";
  if( !in_array($t_text, ['a','em','strong','span','div','blockquote','font'])) {
    if(strlen($t_text) > 1) {
      $row['text'] = preg_replace($re,'<strong class="highlight">$0</strong>',$row['text'] );
      $row['title'] = preg_replace($re,'<strong class="highlight">$0</strong>',$row['title'] );
    }
  }
}

(Where $terms is the list of search terms, and $row is a post which contains the search terms)

We have noticed that, when the search term is included in a tag's attribute (i.e. say we're searching for "foo", and we have a link like this: <a href="foo.php">Some text including foo</a>), the term in the attribute is also wrapped, breaking the attribute, and we end up with horribly mangled markup. Is there some way to do this with JS, rather than making the regex we're using on the backend much more complicated? (I'm open to a jQuery solution).

I have already tried various methods, including the :contains selector, filtering out everything but leaf nodes (this doesn't work because the a tag may have tags inside of it), filtering out everything but text nodes (doesn't work because attribute text is included here too).

In case this isn't totally clear, here's an example of the markup for the body of a hypothetical post (we store these directly in a database), where the search term is "post":

<p>
  Here is <a href="post.php?id=42"><em>another</em> post</a> that is relevant to this one.
</p>

After running the above PHP code on the post, it looks like this:

<p>
  Here is <a href="<strong class="highlight....

... and the whole thing is broken.

... Update

To those requesting relevant code: I don't know how to solve this problem, so there is not any code to include yet. I am asking what the code to solve this problem should look like. If there is something about the question that is unclear, please point it out and I will gladly expand.

samson
  • 1,152
  • 11
  • 23
  • 1
    Please include all relevent code, otherwise we can't help you. – James Douglas Apr 02 '18 at 20:01
  • @JamesDouglas I'm not sure what code you need, can you be more specific? Do you need the backend code? I intend to abandon that code, so it's not super relevant... – samson Apr 02 '18 at 20:14
  • *Please include all **relevant** code*. – connexo Apr 02 '18 at 20:23
  • _Can you be more __specific__ about what code you need?_ Our code base contains tens of thousands of lines. Would you like all of them? – samson Apr 02 '18 at 20:27
  • I've added the current backend implementation, which is all the code that I can think of which is remotely relevant. Is there something I'm missing here? Do you need more information about how the post gets from our database into the final markup? – samson Apr 02 '18 at 20:38
  • @samson The **relevent** code means the code which wraps all instances of the word "post" in a ``. The code causing the problen. – James Douglas Apr 03 '18 at 08:23
  • @JamesDouglas ok, I've included that code. Any ideas? – samson Apr 03 '18 at 13:44

2 Answers2

1

the simplest solution is to use mark.js.
It's very easy to implement and comes as a stand-alone file.
The instruction on the website are super clear and it's exactly what you are looking for.
The script will highlight any word of your choosing without touching the tags.

-1

update

I was playing around with this to improve on it and found some annoying bugs. In trying to correct them I came across a couple of answers that have been much more developed than my own.. Javascript Regex to replace text NOT in html attributes

This is a long and ugly possible js/jquery solution, a better coder could come up with something more concise, but to get you started..

explanation

You'll have to pass the search term to the variable gSearchedFor

I was unable to combine * with :contains so I created an array that you will have to manually fill out containing the elements you want to search for. (Ill come back and update if I figure out how to combine them so an array isnt needed.)

It first looks for the original casing of the search term, then when finished looping through all elements it uppercases the search term and runs again looking for the uppercased version.

fiddle

https://jsfiddle.net/Hastig/Lz53959L/

var gSearchedFor = 'foo';
var elementsArray = ['h2','div','a']; // add to these (i couldnt figure out how to combine the all selector(*) with :contains)
var elementsArrayLength = elementsArray.length;
var loopCounter = 0;
$.each(elementsArray, function() {
 loopCounter++;
 var thisElement = this;
  // from https://stackoverflow.com/a/16090558/3377049
  $(""+thisElement+":contains('"+gSearchedFor+"')").each(function() {
   var replaceWithThis = $(this).html().replace(gSearchedFor, "<span class='highlighter'>"+gSearchedFor+"</span>");
    $(this).html(replaceWithThis);
  })
  // now search for capitalized versions of search term
  if (elementsArrayLength === loopCounter) { 
   // from https://stackoverflow.com/a/42294347/3377049
    gSearchedFor = gSearchedFor.substring(0,1).toUpperCase() + gSearchedFor.substring(1,gSearchedFor.length);
    $.each(elementsArray, function() {
      var thisElement = this;
      // from https://stackoverflow.com/a/16090558/3377049
      $(""+thisElement+":contains('"+gSearchedFor+"')").each(function() {
        var replaceWithThis = $(this).html().replace(gSearchedFor, "<span class='highlighter'>"+gSearchedFor+"</span>");
        $(this).html(replaceWithThis);
      })
    })
  }
})
.highlighter {
  background-color: yellow;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<h2>Highlighting Foo</h2>
<div class="foo">
  Some test text looking for foo and here's a reference to <a href="foo">foo</a>.
</div>
Hastig Zusammenstellen
  • 4,286
  • 3
  • 32
  • 45
  • Sweet! I actually ended up going with [this answer](https://stackoverflow.com/a/3241437/1166029), which was attached to a question referenced in the one you referenced at the top of your answer. Thanks for the breadcrumbs! – samson Apr 03 '18 at 14:01