How to highlight the search-result of a text-query within an html document ignoring the html tags?

Question

I have a string which has html content in it. Something like this

  const text = "My name is Alan and I <span>an</span> <div class="someClass">artist</div>."

I render this inside a react component using dangerouslySetInnerHTML. This text is really long and has different types of HTML tags in it.

I want to search for a word and highlight it in that document as the user is typing. The functionality is similar to the browser's find (cmd + f) feature. As you type the text should get highlighted.

This is what I am looking for:

 user types `an`
 const text = "My name is Alan and I <span>an</span> <div class="someClass">artist</div>."
result: "My name is Al<mark>an</mark> and I <span><mark>an</mark></span> <div class="someClass">artist</div>."

I tried using this library https://github.com/bvaughn/react-highlight-words but the issue is it highlights the text inside the tags too and messes up the content.

result: "My name is Al<mark>an</mark> and I <sp<mark>an</mark>><mark>an</mark></span> <div class="someClass">artist</div>."

Then I though I'll use my own regex and came up with this regex:

const regex = new RegExp(((`${searchedText}`)(?![^<>]*>)))

but react(eslint) throws this error at ?:

This experimental syntax requires enabling the parser plugin: 'partial Application'

Here's my code:

get highlightedText() {
      if (searchText === '') return self.renderedText;
      const regex = new RegExp((`${searchText}`)((?![^<>]*>)));
      const parts = self.renderedText.split(regex);
      return parts
         .map(part => (regex.test(part) ? `<mark>${part}</mark>` : part))
         .join('');
    },

I am not sure what I am doing wrong. The regex works perfectly fine as I tested the regex using regextester.com

Any help is appreciated. Thanks!

score 1 · Answer 1 · answered Jul 10 '20 at 23:25

An approach, based on regular expressions, that manipulates html markup at string-template level, does only work for strictly valid and unnested markup, like the example that was given by the OP.

const text = 'My name is Alan and I\'m <span>an</span> <div class="someClass">artist</div>.'

Such an approach will not work for any nested html markup like the following one ...

const text = 'My name is Alan and I\'m <span><em>an</em></span> <div><em>artist</em></div>.'

As for the OP's provided use case, in order to not accidentally manipulate any html markup, a regex needs to match and memorize opening and closing tags as well as the enclosed text content. Thus one needs to work with Capturing Groups.

An example-regex that uses Named Groups is hereby provided ...

const test = 'My name is Alan and I\'m <span>an</span> <div class="someClass">artist</div>.'

const regXSimpleMarkup = (/(?<tagStart><[^>]+>)(?<text>[^<]+)(?<tagEnd><\/[^>]+>)/g);

[...test.matchAll(regXSimpleMarkup)].forEach((match, idx) =>
  console.log(`match ${ idx } :: groups : `, match.groups)
);

console.log([...test.matchAll(regXSimpleMarkup)]);

.as-console-wrapper { min-height: 100%!important; top: 0; }

.., but as one can see from the result of the above running code, one does not match/capture all the other text content before or after an html tag. Thus one should take advantage of the combination of a capturing regex and split ...

const test = 'My name is Alan and I\'m <span>an</span> <div class="someClass">artist</div>.'

// const regXSimpleMarkup = (/(?<tagStart><[^>]+>)(?<text>[^<]+)(?<tagEnd><\/[^>]+>)/g);
const regXSimpleMarkup = (/(<[^>]+>)([^<]+)(<\/[^>]+>)/g);

console.log(test.split(regXSimpleMarkup));

.as-console-wrapper { min-height: 100%!important; top: 0; }

As it is proved above, for the OP's given example the result is a cleanly separated list of markup fragments. This list now could be stepwise processed in a way that only for each detected text content a search and replace mechanism (search for substring and create highlighting markup) gets applied, whilst with each iteration step the new html markup string gets build programmatically as well.

//  How to escape regular expression special characters using javascript?
//
//  [https://stackoverflow.com/questions/3115150/how-to-escape-regular-expression-special-characters-using-javascript/9310752#9310752]
//
function escapeRegExpSearchString(text) {
  return text.replace(/[-[\]{}()*+?.,\\^$|#\\s]/g, '\\$&');
}


function createTextSearchMarkup(fragment, search, isCaseSensitive) {
  const flags = `g${ !!isCaseSensitive ? '' : 'i' }`;

  search = escapeRegExpSearchString(search);
  search = RegExp(`(${ search })`, flags);

  return fragment.replace(search, '<mark>$1</mark>');
}

function concatTextSearchMarkup(collector, fragment) {
  const regXTag = (/^<[^>]+>$/);

  if (!regXTag.test(fragment)) {

    fragment = createTextSearchMarkup(
      fragment,
      collector.search,
      collector.isCaseSensitive
    );
  }
  collector.markup = [collector.markup, fragment].join(''); // concat.

  return collector;
}

function getHighlightTextSearchMarkup(markup, search, isCaseSensitive) {
//const regXSimpleMarkup = (/(?<tagStart><[^>]+>)(?<text>[^<]+)(?<tagEnd><\/[^>]+>)/g);
  const regXSimpleMarkup = (/(<[^>]+>)([^<]+)(<\/[^>]+>)/g);

  return markup.split(regXSimpleMarkup).reduce(
    concatTextSearchMarkup, {
      isCaseSensitive,
      search,
      markup: ''
    }
  ).markup;
}


const markup = 'My name is Alan and I\'m <span>an</span> <div class="someClass">artist</div>.'

console.log('original markup => ', markup);

console.log(
  'case insensitive search for "an" => ',
  getHighlightTextSearchMarkup(markup, 'an')
);
console.log(
  'case insensitive search for "i" => ',
  getHighlightTextSearchMarkup(markup, 'i')
);
console.log(
  'case sensitive search for "i" => ',
  getHighlightTextSearchMarkup(markup, 'i', true)
);

.as-console-wrapper { min-height: 100%!important; top: 0; }

Note

For any nested markup within html template strings one needs an approach that takes advantage of a browsers native html parsing/rendering via e.g. an HTML (fragment) node that at no time is part of the browser DOM.

I can't thank you enough for all the effort you have put on this answer. Your solution worked like charm on my react native project. — ArkaneKhan, Apr 02 '21 at 04:11

Vlad · Answer 2 · 2020-07-10T16:08:22.837

Try this:

     function highlightedText(yourText, searchValue) {
     
      if (!searchValue) return yourText;

      let rgx = "?![^<>]*>";
      const regex = new RegExp(`(${trim(searchValue)})(${rgx})`, 'gi');

      return compose(   
        join(''),
        map(part => (regex.test(part) ? `<span style="background-color: #fff200;">${part}</span>` : part)),
        split(regex)
      )( yourText);
      
    };
    };

note that I use {map join trim} from lodash/fp. A better choice would be with js-coroutines for data manipulation on larger texts, or big dataset ( http://js-coroutines.com/ )

How to highlight the search-result of a text-query within an html document ignoring the html tags?

2 Answers2

Linked