While using regexps with HTML is bad, matching a string that does not contain a given pattern is an interesting question in itself.
Let's assume that we want to match a string beginning with an a
and ending with a z
and take out whatever is in between only when string bar
is not found inside.
Here's my take: "a((?:(?<!ba)r|[^r])+)z"
It basically says: find a
, then find either an r
which is not preceded by ba
, or something different than r
(repeat at least once), then find a z
. So, a bar
cannot sneak in into the catch group.
Note that this approach uses a 'negative lookbehind' pattern and only works with lookbehind patterns of fixed length (like ba
).