In normal regular expression grammar, (a+b)*
means zero or more of any sequence that start with a
, then have zero or more a
, then a b
. This discounts things like baa
(it doesn't start with a
), abba
, and a
(there must be one exactly b
after each a
group), so is not correct.
(a*b*)*
means zero or more of any sequence that contain zero or more a
followed by zero or more b
. This is more correct since it allows for either starting character, any order and quantity of characters, and so on. It also allows the empty string which I'm pretty certain should be allowed by Σ*
(but I'll leave that up to you).
However, it may be better to opt for the much simpler [ab]*
(or [ab]+
in the unlikely event you consider an empty string invalid). This is basically zero (one for the +
variant) or more of any character drawn from the class [ab]
.
However, it's possible, since you're using Σ
, that you may be discussing formal language theory (where Σ
is common) rather than regex grammar (where it tends not to be).
If that is the case then you should understand that there are variants of the formal language where the a | b
expression (effectively [ab]
in regex grammar) can instead be rendered as one of a ∪ b
, a ∨ b
or a + b
, with each of those operator symbols representing "logical or".
That would mean that (a+b)*
is actually correct (as it is equivalent to the regex grammar I gave above) for what you need since it basically means any character from the set {a, b}
, repeated zero or more times.
Additionally, that's also covered by your (a*b*)*
option but it's almost always better to choose the simplest one that does the job :-)
And just something else to keep in mind for the formal language case. In English (for example), "a"
is a word but you'd struggle to find anyone supporting the possibility that ""
is also a word. Try looking it up in a dictionary :-)
In other words, any regular expression that allows an empty sequence of the language characters (such as (a+b)*
) may not be suitable. You may find that (a+b)(a+b)*
is a better option. This depends on whether Σ*
allows for the empty sequence.