I'm trying to produce some javascript code that will traverse an HTML document and pick out words from a JSON array, if matched the javascript would wrap the text in a <a href='glossary/#[matched text]'>[matched text]</a>
and render to screen.
I seem to have that part semi-down, the bit where I'm falling over is how best to tell the system to ignore certain elements (i.e text already in a
, buttons
, input
, element attributes...etc). I've tried to resolve this with the regex and managed to fumble along and get the following:
/(?<!<(a|button|submit|pre|img|svg|path|h[0-9]|.*data-ignore.*>|input\/>|textarea|pre|code))((?<!(="|data-))\btext\b(?!"))(?!<\/(a|button|submit|pre|img|svg|path|h[0-9])>)/gi
(text is the word I'm trying to auto-link) - https://regex101.com/r/u7cLPR/1
If you follow the Regex101 link you'll see I "think" I've managed to cover all bases bar one which is when the word occurs in a class=''
tag (and therefore others like style and such)
Any help here would be greatly appreciated here, as always with Regex I always seem to miss the mark or over-complicate the solution, (is Regex even the right tool for the job here?)