Don't use regular expressions to parse HTML, HTML is far too complex for it.
You've said your starting point is a paragraph element. That means you already have a nicely parsed version of what you want to search. Look through the paragraph's descendant child nodes for Text nodes: For each Test node, see if it contains the word/words you're looking for, then look at its parentNode.tagName
to see if it's in an a
element (perhaps looping through parents to handle the <a href="#xyz"><span>target word</span></a>
case).
For example, here my target word is "example":
function findMatches(target, para, element = para) {
let child = element.firstChild;
while (child) {
if (child.nodeType === 3 && child.nodeValue.includes(target)) {
const a = child.parentNode.closest("a");
if (!a || !para.contains(a)) {
console.log(`Found in '${child.nodeValue}'`);
}
}
child = child.nextSibling;
}
}
findMatches("example", document.getElementById("theParagraph"));
<p id="theParagraph">This example matches, but <a href="#">this example</a> and <a href="#"><span>this example</span></a> don't match.
That example uses ES2015+ features and modern browser features like closest
, but can be written in ES5 (and closest
can be polyfilled).