Match regex only if not part of hyperlink

Question

I need RegEx to match text only if it is not part of hyperlink. but can be part of

tag

e.g.

<p>
bla bla bla textToMatch blabla
</p>

would match textToMatch

but

<a href="http://www.google.com" alt="textToMatch">bla textToMatch</a>

would be ignored

Tried number of articles to work this out but no luck

Don't parse HTML with regex. [Parsing HTML with regex is a hard job](https://stackoverflow.com/a/4234491/372239) — Toto, May 20 '19 at 09:32

score -1 · Answer 1 · answered May 20 '19 at 10:03

Try the following regex:

<[^\/a] *[^>]*>[^<]*(textToMatch)

Details:

< - < (literally) - start of a tag,
[^\/a] - something other than a (to exclude anchor tag) or / (to exclude any closing tag)
* - (space and asterrisk) - an optional space,
[^>]*> - a possibly empty sequence of chars other than > (inner part of the opening tag) and > (closing the opening tag),
[^<]* - a possibly empty sequence of chars other than < (no other opening/closing tag),
(textToMatch) - the text to match as a capturing group.

This way the "preceding stuff" matched is as the main body of the match, but the text you actually want to match is in capturing group No 1.

The "preceding stuff" can not be included as any lookbehind, because lookbehind must have a fixed length.

1 Answers1