Try this way
String data = "aaaabbbaaaaab";
Matcher m = Pattern.compile("(?=(a+b+|b+a+))(^|(?<=a)b|(?<=b)a)").matcher(data);
while(m.find())
System.out.println(m.group(1));
This regex uses look around mechanisms and will find (a+b+|b+a+)
that
- exists at start
^
of the input
- starts with
b
that is predicted by a
- starts with
a
that is predicted by b
.
Output:
aaaabbb
bbbaaaaa
aaaaab
Is ^
essentially needed in this regular expression?
Yes, without ^
this regex wouldn't capture aaaabbb
placed at start of input.
If I wouldn't add (^|(?<=a)b|(?<=b)a)
after (?=(a+b+|b+a+))
this regex would match
aaaabbb
aaabbb
aabbb
abbb
bbbaaaaa
bbaaaaa
baaaaa
aaaaab
aaaab
aaab
aab
ab
so I needed to limit this results to only these that starts with a
that has b
before it (but not include b
in match - so look behind was perfect for that) and b
that is predicted by a
.
But lets not forget about a
or b
that are placed at start of the string and are not predicted by anything. To include them we can use ^
.
Maybe it will be easier to show this idea with this regex
(?=(a+b+|b+a+))((?<=^|a)b|(?<=^|b)a)
.
(?<=^|a)b
will match b
that is placed at start of string, or has a
before it
(?<=^|b)a
will match a
that is placed at start of string, or has b
before it