I propose not using a regular expression here, as the problem is a bit too complex and while it's possible to do using a regex it will not be pretty and you can miss some edge cases.
So, here is how I would do it
var input = "this is first <a>this is second</a> this is third";
//this isn't truly using a regex - it's a very small part of the solution, hence a very small regex. It will match any word "is" but not "this"
var regex =/\bis\b/ig;
//this is a new unattached element that will be used to leverage the DOM parsing of the browser
var scratchPad = document.createElement("div");
//this will set the content of the <div> tag and it will be parsed as HTML
scratchPad.innerHTML = input;
//no need to parse the tags manually - we can just do it like this
var allTags = Array.from(scratchPad.getElementsByTagName("a"));
//iterate and modify the elements in place
allTags.forEach(function(el) {
el.innerHTML = el.innerHTML.replace(regex, "!FOO!")
});
//see the results
console.log(scratchPad.innerHTML);
This is lengthy but it is to illustrate what happens.
Here is a more realistic example of how it might be used
function replacer(input, replaceWhat, replaceWith, inTag) {
var regex = new RegExp("\\b" + replaceWhat + "\\b", "ig");
var scratchPad = document.getElementById("_replacingDiv");
//if not there, create it and attach it, so it's available next time
if (!scratchPad) {
scratchPad = document.createElement("div");
scratchPad.id = "_replacingDiv";
scratchPad.hidden = true;
document.body.appendChild(scratchPad);
}
scratchPad.innerHTML = input;
var tags = scratchPad.getElementsByTagName(inTag);
Array.prototype.forEach.call(tags, function(el) {
el.innerHTML = el.innerHTML.replace(regex, replaceWith);
});
return scratchPad.innerHTML;
}
var inputSimple = "this is first <a>this is second</a> this is third";
var inputComplex = "this is first <a>this is second</a> \n"+
"this is third <a>this is fourth</a> \n"+
"this is fifth <a>no match sixth</a> \n"+
"this isn't matched seventh <a>this isn't matched eighth</a> \n"+
"mutltiple is is example ninth <a>multiple is is example tenth </a>";
console.log(replacer(inputSimple, "is", "!FOO!", "a"));
console.log(replacer(inputComplex, "is", "!FOO!", "a"));
This should not be taken as the final form of the function but from here, you can tailor it to your needs. Here are some things that might need changing:
- One obvious improvement is to pass a configuration object, although you might also need less parameters in which case, it won't be as needed
- Do you want to consider any match valid or only words? Right now it only handles words, although if you want to replace it that will also be replaced inside it's - that might or might not suit your needs.
- If the input is driven by the user then this specific approach might be vulnerable as it will be interpreting untrusted code. In this case you can use a sandboxed iframe as
scratchPad
instead and this will prevent any malicious code from being ran. However, you will still need to deal with the output securely.
- if the replacement driven by the user the same applies as above.
- this will only really work if the tags you are replacing do not have other tags nested in them. For example, if you want to replace every occurrence of
foo
in a span
but that tag contains a nested a
. If that's the case, it's up to you to determine what would exactly would happen - whether the inner tags will be considered or skipped. Beware not to replace anything in side any HTML, say, if there is <a class="foo">