5

I am going through exercises regarding regular expressions, and I am really unsure on how to do this.

The regular expression is:

((a*)(b*))* ∪ (a*)

I am really bad at this, but I think that ((a*)(b*))* can be simplified to (a ∪ b)* But if this is right, than the last ∪ (a*) is really just a repetition, so I figure the whole expression can be simplified to (a ∪ b)*. Does this seem correct?

Edit: ∪ stands for union

J D
  • 48,105
  • 13
  • 171
  • 274
user2795095
  • 501
  • 2
  • 9
  • 13
  • 2
    Does `U` stand for union? That is, with `(a U b)*` do you actually mean what would be expressed in regex as `(a|b)*` or `[ab]*`? Cause if so that pattern you have seems equal to just `(a|b)*` like you suggested. – Daniël Knippers May 09 '14 at 14:07
  • 1
    If `U` is supposed to mean "union", then it would be appropriate to use the proper symbol (`∪`) - or at least state it in the text. – Tomalak May 09 '14 at 14:12
  • @Tomalak, how do you use the union symbol? – perreal May 09 '14 at 14:25
  • @perreal It's [a normal Unicode character](http://www.fileformat.info/info/unicode/char/222a/index.htm), you can insert it verbatim. Of course, if all client fonts contain a representation of that character, that's a different matter. – Tomalak May 09 '14 at 14:28
  • 2
    Wouldn't that question better belong to [cs.stackexchange.com](http://cs.stackexchange.com/questions/tagged/regular-languages) as it's kind of confusing here :] – Jonny 5 May 09 '14 at 15:06
  • @Jonny5 I agree, this question should be closed or moved due to nature. – Deele May 09 '14 at 16:08

2 Answers2

4

You are right. (a*b*)* can match any string of a's and b's, so can (a U b)*, therefore they are equivalent. (a U b)* intersect a* is a* so a* is a subset of (a U b)*. Consequently, the whole expression can be simplified to (a U b)*.

perreal
  • 94,503
  • 21
  • 155
  • 181
  • You are wrong. `(a U b)*` will match literal `a U b` zero or more times. It is no longer what original regex was intended to match. – Deele May 09 '14 at 14:58
  • why do you say that? Union and OR are exactly the same thing. and this question is about formal regular languages. – perreal May 09 '14 at 15:00
  • They are not in context of regex. `(a | b)*` is wrong expression too. OP did not state that he is asking about _format regular languages_ or anything like that. He added `regex` tag and asked about regex. So question should be downvoted and you should use proper regex symbols, to answer regex questions. Every character used in expression counts. – Deele May 09 '14 at 15:05
  • Because it is: literal `a ` OR literal ` b`, zero or more times. Original regex was much more about `a`'s and `b`'s, not about abstract units. – Deele May 09 '14 at 16:07
-2

What ((a*)(b*))*U(a*) really means is (copied from here)

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (                        group and capture to \1 (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    (                        group and capture to \2:
--------------------------------------------------------------------------------
      a*                       'a' (0 or more times (matching the
                               most amount possible))
--------------------------------------------------------------------------------
    )                        end of \2
--------------------------------------------------------------------------------
    (                        group and capture to \3:
--------------------------------------------------------------------------------
      b*                       'b' (0 or more times (matching the
                               most amount possible))
--------------------------------------------------------------------------------
    )                        end of \3
--------------------------------------------------------------------------------
  )*                       end of \1 (NOTE: because you are using a
                           quantifier on this capture, only the LAST
                           repetition of the captured pattern will be
                           stored in \1)
--------------------------------------------------------------------------------
  U                        'U'
--------------------------------------------------------------------------------
  (                        group and capture to \4:
--------------------------------------------------------------------------------
    a*                       'a' (0 or more times (matching the most
                             amount possible))
--------------------------------------------------------------------------------
  )                        end of \4

This expression currently matches all of these sequences: abUa bU U aabbUaa aaUaa aaU Uaa bbU ababUaa aabbaabbUaa (look at here)

There is no way to simplify this, without removing capturing groups and remaining order of letters.

EDIT: If U in your regex statement stands for "union", then this expression is invalid. There is no way to union anything in regex. There is only OR and you need to use | (pipe) for that. If you want to union ((a*)(b*))* and (a*) then probably it will be ((a*)(b*))*, but it will still match anything like abaab.

Still, capturing groups in your regex statement are useless, so something like [ab]* is enough to match any number of a's and b's.

Deele
  • 3,728
  • 2
  • 33
  • 51