JavaScript RegExp - How to match a word based on conditions

Question

I'm building a search results page (in Angular) but using regular expressions to highlight the searched 'keywords' based on a condition. I'm having problems with RegExp with getting the correct condition, so apologies if my current syntax is messy, I've been playing about for hours.

Basically for this test i'm highlighting the word 'midlands' and I want to highlight every 'midlands' word except the word within the 'a' tag <a /> of the href="" attribute. So anything that's apart of the URL I do not want to highlight as I'll be wrapping the keywords within a span and this will break the url structure. Can anyone help? - I think I'm almost there.

Here's the current RegExp I'm using:

/(\b|^|)(\s|\()midlands(\b|$)(|\))/gi

Here's a link to test what I'm after. https://regex101.com/r/wV4gC3/2

Further info, after the view has rendered I grab the the html content of the repeating results and then do a search based on the rendered html with the condition above. - If this helps anyone.

It doesn't match the ones inside href attributes. Or at least it doesn't look so from the provided link. Could you be a bit more straight forward about what's the issue, please? — Dropout, Dec 08 '15 at 11:44
do you mean, you want to only get the instances of midlands in between tags and not in attributes? or specifically the href attribute — synthet1c, Dec 08 '15 at 11:48
Hi, thanks for your responses. I would like to not get a match within the attributes i.e href etc. Otherwise this will break my html. thanks. @ Jacques it's almost working but there's 2x instances that aren't matching which they should due to the '>' character — GBrooksy, Dec 08 '15 at 12:26

score 1 · Answer 1 · edited May 23 '17 at 11:59

1

You're going about this all wrong. Don't parse HTML with regular expressions - use the DOM's built in HTML parser and explicitly run the regex on text nodes.

First we get all the text nodes. With jQuery that's:

var texts = $(elem).content().get().filter(function(el){
    return el.nodeType === 3; // 3 is text
});

Otherwise - see the answer here for code for getting all text nodes in VanillaJS.

Then, iterate them and replace the relevant text only in the text nodes:

foreach(var text of texts) { // if old browser - angular.forEach(texts, fn(text)
    text.textContent = text.textContent.replace(/midlands/g, function(m){
       return "<b>" + m + "</b>";  // surround with bs. 
    });
}

edited May 23 '17 at 11:59

Community

1
1

answered Dec 08 '15 at 11:53

Benjamin Gruenbaum

270,886
87
504
504

Thanks I've looked into this and got a solution working with the jquery method, although I had to look into each individual element and then do an 'each' with each keyword if supplied. - long winded. I'll need to check performance. I was hoping regex would have a syntax filter out '/' and '-' characters etc maybe? – GBrooksy Dec 08 '15 at 13:18
No, I worked on such proposal but because of an edge case no one will run into anyway it is being severely delayed. You can grab a short polyfill here: https://github.com/benjamingr/RegExp.escape for escaping a given word for `-` and `/` and so on. – Benjamin Gruenbaum Dec 08 '15 at 13:33

JavaScript RegExp - How to match a word based on conditions

1 Answers1