1

I am building an autocomplete in JavaScript that needs to highlight words when doing a search:

highlighting example

That works fine, but there's an issue with escaped characters. When trying to highlight a text with escaped characters (for example regex &>< example), the following is happening:

regex highlighting

That's happening because I am doing the following:

element.innerHTML.replace(/a/g, highlight)

function highlight(str) {
  return '<span class="foo"' + '>' + str + '</span>';
}

and innerHTML includes the word &amp;, so it makes sense.

In conclusion, I need a way to solve that so I would like a function that:

  • receives a and regex <br> example and returns regex &lt;br&gt; ex<span class="foo">a</span>mple
  • receives r and regex <br> example and returns <span class="foo">r</span>egex &lt;b<span class="foo">r</span>&gt; example
  • receives < and regex <br> example and returns regex <span class="foo">&lt;</span>br&gt; example

The entries may or may not contain html blocks, see the issue here (search for <br> or &)

lante
  • 7,192
  • 4
  • 37
  • 57
  • 2
    If you don't want the entities returned, use `textContent` to get the content, and `innerHTML` to set it with the span included ? – adeneo Aug 13 '16 at 22:02
  • `textContent` has characters escaped (i.e.: `&`), its just excluding html like `span`s for example, so that won't work – lante Aug 13 '16 at 22:05
  • This looks like another example of those `You can't parse html with regular expressions` scenarios: http://stackoverflow.com/a/1732454/25216 – Andrew Shepherd Aug 13 '16 at 22:16
  • @AndrewShepherd There is no parsing nor regular expressions involved here... (unless you count the single char `/a/` as a whole regex) – rvighne Aug 13 '16 at 22:17
  • Is the `itemElem` populated with the escaped (non-highlighted) text and then the highlighter will attempt to include the `span` to highlight the text? Or will a function be given the raw unescaped string and then set the highlighted escaped text to the `itemElem`? – Jason Cust Aug 13 '16 at 22:22
  • @JasonCust, the element is populated once, so the function should receive the unscaped string and then generate a escaped string with the html modified to highlight the given word – lante Aug 13 '16 at 22:24
  • @lante - [Did you try it?](https://jsfiddle.net/czdrg2j3/1/) – adeneo Aug 13 '16 at 22:32
  • I updated my answer to cover both scenarios. I hope it helps! :) – Jason Cust Aug 13 '16 at 22:40
  • @adeneo sorry for the misunderstood, at that moment `innerHTML` is already escaped: https://jsfiddle.net/8rd8mbm9/ – lante Aug 13 '16 at 22:43
  • What are you using on the server side? – Robert Aug 13 '16 at 22:43
  • @RobertRocha nothing, just plain javascript on the client – lante Aug 13 '16 at 22:44
  • What you are trying to do is autocomplete. And if you do have spmething on the server side this can be so much simpler than using regex's. – Robert Aug 13 '16 at 22:51
  • What are you comparing the input to? – Robert Aug 13 '16 at 22:55
  • And I still don't get it, if you want to get rid of the entities you'd still use `textContent` -> **https://jsfiddle.net/8rd8mbm9/1/** – adeneo Aug 13 '16 at 23:01
  • @adeneo I updated the question with a working example with the issue, see [here](https://jsfiddle.net/8rd8mbm9/2/) and try to look for `&` for example – lante Aug 13 '16 at 23:35
  • Don't reinvent the wheel, use e.g. [mark.js](https://markjs.io/). – dude Aug 14 '16 at 04:09
  • Btw: What's the point for not using e.g. typeahead.js? – dude Aug 14 '16 at 04:23
  • Using innerHTML is evil as it destroys events and regenerates the DOM – dude Aug 14 '16 at 04:26
  • @dude thanks for the suggestions, but what I need is much more simpler. Also, typeahead doesn't cover a use case that I need and that's why I am building another autocomplete, but thats not part of the question :) – lante Aug 14 '16 at 19:22

1 Answers1

1

str.replace only returns a new string with the intended replacements. The original string is unchanged.

var str = 'replace me';
var str2 = str.replace(/e/g, 'E');

// For display only
document.write('<pre>' + JSON.stringify({
  str: str,
  str2: str2
}, null, 2) + '</pre>');

Therefore the code needs to set the returned value from the replace back to the desired element.

Also, innerHTML will return the escaped text rather than the unescaped text. This could be unescaped itself within the function but why bother if you can use textContent. However by using innerHTML when it's time to set the highlighted text to the element it will auto-escape the text for us. :)

UPDATE: the values are passed to the function and then set to the element:

NOTES:

  • The regexp could probably be made a bit more robust to avoid having to handle the special case using lastIndex
  • There needs to be some protection on the input as someone could provide a nasty regexp pattern. There is a minimal protection check in this example.

higlightElemById('a', 'regex &>< example', 'a');
higlightElemById('b', 'regex &>< example', '&');
higlightElemById('c', 'regex <br> example', '<');
higlightElemById('d', 'regex <br> example', 'e');
higlightElemById('e', 'regex <br> example', '[aex]');

function higlightElemById(id, str, match) {
  var itemElem = document.getElementById(id);
  // minimal regexp escape to prevent shenanigans
  var safeMatch = match.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
  // construct regexp to match highlight text
  var regexp = new RegExp('(.*?)(' + safeMatch + ')', 'g');
  var text = '';
  var lastIndex;
  var matches;
  
  while (matches = regexp.exec(str)) {
    // Escape the non-matching prefix
    text += escapeHTML(matches[1]);
    // Highlight the match
    text += highlight(matches[2]);
    // Cache the lastIndex in case no regexp at end
    lastIndex = regexp.lastIndex;
  }

  if (text) {
    text += escapeHTML(str.substr(lastIndex));
  } else {
    text += escapeHTML(str);
  }

  itemElem.innerHTML = text;
}

function highlight(str) {
  return '<span class="myHighlightClass">' + str + '</span>';
}

function escapeHTML(html) {
  this.el = this.el || document.createElement('textarea');

  this.el.textContent = html;
  return this.el.innerHTML;
}
.myHighlightClass {
  text-decoration: underline;
  color: red;
}
<div id="a"></div>
<div id="b"></div>
<div id="c"></div>
<div id="d"></div>
<div id="e"></div>
Jason Cust
  • 10,743
  • 2
  • 33
  • 45
  • thanks for your answer but I am seeing an extra case which is not covered: https://jsfiddle.net/gemxuc3e/ I updated the question to clarify – lante Aug 13 '16 at 23:06
  • @lante What is your desired output in that case? Would "
    " be escaped or not in the input and would it need to be in the escaped in the output element?
    – Jason Cust Aug 13 '16 at 23:20
  • See [here](https://jsfiddle.net/8rd8mbm9/2/) an example, the entries may or may not contain html blocks (I can't forbid the user to put html blocks as entries) search for `br` or `&` – lante Aug 13 '16 at 23:34
  • @lante Does it need to be a global match or would highlighting only the first match work? – Jason Cust Aug 14 '16 at 00:00
  • global match since its an autocomplete – lante Aug 14 '16 at 00:01
  • @lante I made some edits but in all honesty if you keep changing the original question it makes it very hard to answer. It's already at a point of possibly needing a bounty. – Jason Cust Aug 14 '16 at 01:51