4

We're learning about the difference between regular languages and regex, and the teacher explained that the language

a^n b^n

is not regular, but she said that most regex flavors can match

a^n A^n

and she gave this problem for our extra credit homework problem. We've been struggling for a couple of days now, and could really use some guidance.

Gilles 'SO- stop being evil'
  • 104,111
  • 38
  • 209
  • 254
user282886
  • 3,125
  • 8
  • 22
  • 11

1 Answers1

11

The teacher gave a HUGE hint by limiting the alphabet to {a, A}. The key to solving this problem is realizing that in case-insensitive mode, a matches A and vice versa. The other component of the problem is matching on backreference.

This pattern will match a{n}A{n} for some n (as seen on rubular.com):

^(?=(a*)A*$)\1(?i)\1$

How it works

The pattern works as follows:

  • ^(?=(a*)A*$) - Anchored at the beginning of the string, we use positive lookahead to see if we can match (a*)A* until the end of the string
    • If we can succesfully make this assertion, then \1 captures the sequence of a{n}
    • If the assertion fails, then there's no way the string can be a{n}A{n}, since it's not even a*A*
  • We then match \1(?i)\1$, that is, the a{n} captured by \1, then in case-insensitive mode (?i), we match a{n} again until the end of the string
  • Thus, since:
    • The string is a*A*
    • \1 is a{n}
    • \1(?i)\1 can only be a{n}A{n}

Related questions

References

See also

Community
  • 1
  • 1
polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • 2
    For someone just learning the language theory, this little blurb is worth reading after this answer : http://en.wikipedia.org/wiki/Regular_expression#Patterns_for_non-regular_languages – Donnie Jun 27 '10 at 14:32
  • +1, Nice solution. But one thing that confused me, how did you infer that 'a^n' means 'a' repeated n times? I didn't get that from the question, but I guess it makes sense. – Stephen Jun 27 '10 at 14:39
  • 1
    @Stephen: I believe it's a common notation in language theory. The question I linked to uses the same notation. I've modified my answer to use traditional regex notation. – polygenelubricants Jun 27 '10 at 14:43