Besides using look-arounds, you can do this with standard capturing groups.
First, define what you're looking for...
Find abc.com
when it appears within tags — enclosed within >
and <
and optionally has additional dotted prefixes and/or suffixes.
Replace abc.com
with random
when found in these circumstances.
This looks to me like a multi-level domain with two or more segments, which abc.com
must be a part of.
What are the parts we're looking for?
- The fixed string abc.com – the re for that is
abc\.com
escaping the dot so it's a literal instead of "any character".
- Optional domain parts preceding "abc.com" – letters followed by a dot
[a-z]+\.
... but there can be zero or more of them, so ([a-z]+\.)*
- Optional domain parts following "abc.com" – a dot followed by letters, zero
or more times (\.[a-z]+)*
- All of that enclosed within the end of a start tag
>
and the start of an end tag <
... so >something<
Putting that all together we get >([a-z]+\.)*abc\.com(\.[a-z]+)*<
which needs to be escaped to be a Java string ">([a-z]+\\.)*abc\\.com(\\.[a-z]+)*<"
Now, since we are matching and consuming the >
and <
we'll need them in the replacement string, and we're capturing the prefix and suffix so we need to put those in the replacement also using the capturing groups 1 and 2, giving >$1random$2<
I put this re in at regexplanet.com http://fiddle.re/fxhfxn using these test strings and producing the replacement strings:
<abc="xyz">abc.com</abc> => <abc="xyz">random</abc>
<abc="xyz">abc.commmmm</abc> => <abc="xyz">abc.commmmm</abc>
<abc="xyz">text.abc.com.net</abc> => <abc="xyz">text.random.net</abc>
[abc="xyz"]text.abc.com.net[/abc] => [abc="xyz"]text.abc.com.net[/abc]
<abc="xyz">abc-com</abc> => <abc="xyz">abc-com</abc>
<abc="xyz">abc.com.fr</abc> => <abc="xyz">random.fr</abc>
<abc="xyz">www.abc.com</abc> => <abc="xyz">www.random</abc>