What ((a*)(b*))*U(a*)
really means is (copied from here)
NODE EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1 (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
a* 'a' (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
b* 'b' (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
)* end of \1 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
--------------------------------------------------------------------------------
U 'U'
--------------------------------------------------------------------------------
( group and capture to \4:
--------------------------------------------------------------------------------
a* 'a' (0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
) end of \4
This expression currently matches all of these sequences: abUa bU U aabbUaa aaUaa aaU Uaa bbU ababUaa aabbaabbUaa
(look at here)
There is no way to simplify this, without removing capturing groups and remaining order of letters.
EDIT: If U
in your regex statement stands for "union", then this expression is invalid. There is no way to union anything in regex. There is only OR
and you need to use |
(pipe) for that. If you want to union ((a*)(b*))*
and (a*)
then probably it will be ((a*)(b*))*
, but it will still match anything like abaab
.
Still, capturing groups in your regex statement are useless, so something like [ab]*
is enough to match any number of a
's and b
's.