Applying RegEx on all text in element

Question

I'm trying to dynamically replace specific words with a link within a certain HTML element using JS. I figured I'd use a simple RegEx:

var regEx = new RegExp('\\b'+text+'\\b', 'gi');

The quick'n'nasty way it to apply the RegEx replace on the context div's innerHTML property:

context.innerHTML = context.innerHTML.replace(regEx, '<a href="#">'+text+"</a>");

The problem with this is that it also applies to, say image titles, thus breaking the layout of the page. I want it to apply only to the text of the page, if possible also excluding things like header tags and of course HTML comment and such.

So I tried something like this instead, but it doesn't seem to work at all:

function replaceText(context, regEx, replace) {
    var childNodes = context.childNodes;
    for (n in childNodes) {
        console.log(childNodes[n].nodeName);
        if (childNodes[n] instanceof Text) {
            childNodes[n].textContent = childNodes[n].textContent.replace(regEx, replace);
        } else if (childNodes[n] instanceof HTMLElement) {
            replaceText(childNodes[n], regEx, replace);
            console.log('Entering '+childNodes[n].nodeName);
        } else {
            console.log('Skipping '+childNodes[n].nodeName);
        }
    }
}

Can anyone see what I'm doing wrong, or maybe come up with a better solution? Thanks!

UPDATE:

Here's a snippet of what the contents of context may look like:

<h4>Newton's Laws of Motion</h4>
<p><span class="inline_title">Law No.1</span>: <span class="caption">An object at rest will remain at rest, and an object in motion will continue to move at constant velocity, unless a net force is applied.</span></p>
<ul>Consequences: <li>Conservation of Momentum in both elastic and inelastic collisions</li>
<li>Conservation of kinetic energy in elastic collisions but not inelastic.</li>
<li>Conservation of angular momentum.</li>
</ul>
<h5>Equations</h5>
<p class="equation">&rho; = mv</p>
<p>where &rho; is the momentum, and m is the mass of an object moving at constant velocity v.</p>

You shouldn't do that part with RegEx. It's better to narrow down the contents of the nodes on the loop and only its content, with it's child nodes excluded. — , Dec 22 '13 at 13:57
@PraveenJeganathan it's nothing special, just a div containing a bunch of
and and images and stuff... — Sean Bone, Dec 22 '13 at 13:58
@Sean what `context` holds? I will be much easier if you set up a fiddle. — Praveen, Dec 22 '13 at 14:02
instead of using the script on the whole body of the element use it just on your article (make a div with an id for it for instance) — Max, Dec 22 '13 at 14:03
@Sean; if you don't want to let the RegEx hit the node attributes you should only do the RegEx over the contents on that node's level and exclude the rest, so the for-loop won't trip over innerHTML of each parent-node. — , Dec 22 '13 at 14:04
@Max that's what `context` is for - the issue is that it won't work at all — Sean Bone, Dec 22 '13 at 14:06
@Allendar the second method isn't tripping over the element attributes because it's not using `innerHTML` on the whole element but `textContent` on the text nodes - or at least that's the idea, but it doesn't seem to work. — Sean Bone, Dec 22 '13 at 14:09
The for..in loops over all properties of the childNodes object, not all items. — Casimir et Hippolyte, Dec 22 '13 at 15:29

Casimir et Hippolyte · Answer 1 · 2013-12-22T17:02:53.283

You can use this:

function replaceText(context, regEx, replace)
{
    var childNodes = context.childNodes;
    for (var i = 0; i<childNodes.length; i++) {
        var childNode = childNodes[i];
        if (childNode.nodeType === 3) // 3 is for text node
            childNode.nodeValue = childNode.nodeValue.replace(regEx, replace);
        else if (childNode.nodeType === 1 && childNode.nodeName != "HEAD")
            replaceText(childNode, regEx, replace); 
    }
}
replaceText(context, /cons/ig, 'GROUIK!');

The idea is to find all text nodes in "context" DOM tree, It is the reason why i use a recursive function to search text nodes inside child nodes.

Note: I test childNode.nodeName != "HEAD" in the function. It's only an example to avoid a particular tag. In the real life it is more simple to give the body node as parameter to the function.

Thank you very much! This works well, but still a problem remains: one can't replace the string with a tag, as it won't be interpreted... — Sean Bone, Dec 23 '13 at 08:46
Because you're using the `nodeValue` property, which acts on the text node. This is a problem I didn't at first recognize — Sean Bone, Dec 23 '13 at 08:49

score 1 · Accepted Answer · edited May 23 '17 at 10:32

1

As per my understanding, you're trying to replace text in innerHTML but within tags.

First I tried to use to use innerText instead of innerHTML, but it is not giving the expexted result. Later I found a @Alan Moore's answer with Negative Lookahead regex like

(?![^<>]*>)

Which can be use to ignore the text within tags <>. Here is my approach

var regEx = new RegExp("(?![^<>]*>)" + title, 'gi');
context.innerHTML = context.innerHTML.replace(regEx, '<a href="#">'+text+"</a>");

Here is a sample JSFiddle

edited May 23 '17 at 10:32

Community

1
1

answered Dec 22 '13 at 15:20

Praveen

55,303
33
133
164

1

Thank you very much for your answer! This works very well, though this level of regexp is far beyond my knowledge. – Sean Bone Dec 23 '13 at 08:52
Can you think of a way to exclude specific tags aswell? E.g. I wouldn't want to add a link within another link... – Sean Bone Dec 23 '13 at 08:53

Applying RegEx on all text in element

UPDATE:

2 Answers2