2

Im learning about automata. Can you please help me understand how automata with Kleene closure works? Let's say I have letters a,b,c and I need to find text that ends with Kleene star - like ab*bac - how will it work?

Phonon
  • 12,549
  • 13
  • 64
  • 114
Anna
  • 59
  • 1
  • 2
  • 8

3 Answers3

4

The Kleene star('*') means you can have as many occurrences of the character as you want (0 or more). a* will match any number of a's.

(ab)* will match any number of the string "ab"

If you are trying to match an actual asterisk in an expression, the way you would write it depends entirely on the syntax of the regex you are working with. For the general case, the backwards slash \ is used as an escape character:

\* will match an asterisk.

For recognizing a pattern at the end, use concatenation:

(a U b)*c* will match any string that contains 0 or more 'c's at the end, preceded by any number of a's or b's.

For matching text that ends with a Kleene star, again, you can have 0 or more occurrences of the string:

ab(c)* - Possible matches: ab, abc abcc, abccc, etc.

a(bc)* - Possible matches: a, abc, abcbc, abcbcbc, etc.

Chris Dargis
  • 5,891
  • 4
  • 39
  • 63
  • I know what kleene star does but lets say I know how to build automata that recognize words that ends with a certain pattern - how an automata that reconize text that ends with kleene star will work? – Anna Jun 11 '12 at 20:00
  • For recognizing a pattern at the end, use concatenation: `(a U b)*c*` will match any string that contains 0 or more 'c's at the end, preceded by any number of a's or b's. – Chris Dargis Jun 11 '12 at 20:01
  • by reading again I think your definition for kleene star it's not what I mean ( the second answer is corrent) but Im not sure I got you right... – Anna Jun 12 '12 at 20:47
  • a* should be follow by any letter a,b,c – Anna Jun 12 '12 at 20:48
  • I can assure you the definition I gave for Kleene Star is correct. See http://en.wikipedia.org/wiki/Kleene_star. Also, a regular expression for a* followed by any letter a, b, or c: `a*(a U b U c)` – Chris Dargis Jun 13 '12 at 13:38
4

The question seems to be more about how an automaton would handle Kleene closure than what Kleene closure means.

With a simple regular expression, e.g., abc, it's pretty straightforward to design an automaton to recognize it. Each state essentially tells you where you are in the expression so far. State 0 means it's seen nothing yet. State 1 means it's seen a. State 2 means it's seen ab. Etc.

The difficulty with Kleene closure is that a pattern like ab*bc introduces ambiguity. Once the automaton has seen the a and is then faced with a b, it doesn't know whether that b is part of the b* or the literal b that follows it, and it won't know until it reads more symbols--maybe many more.

The simplistic answer is that the automaton simply has a state that literally means it doesn't know yet which path was taken.

In simple cases, you can build this automaton directly. In general cases, you usually build something called a non-deterministic finite automaton. You can either simulate the NDFA, or--if performance is critical--you can apply an algorithm that converts the NDFA to a deterministic one. The algorithm essentially generates all the ambiguous states for you.

Adrian McCarthy
  • 45,555
  • 16
  • 123
  • 175
  • `ab*bc` = `ab+c`. This really means, "a, least one b, c". I'm sorry, I don't find anything a ambiguous about that. After the automaton has seen the `a`, it is simply looking for at least one `b`, followed by a `c`. Once the automaton was in the state `ab+`, it would remain there upon receiving `b`s. The automaton could have only taken one path. – Chris Dargis Jun 13 '12 at 16:58
  • I understand the NFA vs DFA, but since any NFA can be converted to a DFA, then there is really only one path that accepts the string. – Chris Dargis Jun 13 '12 at 17:04
  • @VanDarg: Yes, you can translate the pattern in that manner and avoid the ambiguity. If you try to build a machine directly from the original regular expression, you get to a point where you don't know if the `b` you just saw is part of the `b*` or the `b`. So you need a state that copes with that ambiguity. – Adrian McCarthy Jun 13 '12 at 19:34
0

Your expression ab*bac in English would read something like:

a followed by 0 or more b followed by bac

strings that would evaluate as a match to the regular expression if used for search

abac
abbbbbbbbbbac
abbac

strings that would not match

abaca //added extra literal
bac //missing leading a

As stated in the previous answer actually searching for a * would require an escape character which is implementation specific and would require knowledge of your language/library of choice.

shaunhusain
  • 19,630
  • 4
  • 38
  • 51