3

Are there such things?

Like for example, S -> aSb | ^ (possible words: ^, ab, aabb, aaabbb, aaaabbbb, ...)

From what I've learned, the only regex that closely match the said grammar is: a*b*

But the regex can produce words such as aab, abb, ... where a's and b's aren't equal.

Is there a solution to this? Something like: a*b* if #a = #b

EDIT: I think there is no solution to this.

What is the correct explanation for this? This is actually a snippet of my homework, and I really don't know what to answer since there are no solutions in translating the grammar to regex.

2 Answers2

4

If you are talking about formal language theory then of course all non-regular grammars (as in your example) can't be expressed with a regular expression (per definition).

But if you are wondering of what different regex flavors (in programming languages/regex libs) can do, then you can match all kinds of non-regular grammars/languages.

For example in Perl/PCRE you can match your example language with any of these:

  • Using recursion/sub pattern calls:

    ^(a(?1)b)$

  • Using a backreference (with a conditional):

    ^(?:a(?=a*(b(?(1)\1))))+\1$|^$

You may be interested in this questions and answers: Match a^n b^n c^n (e.g. "aaabbbccc") using regular expressions (PCRE)

Community
  • 1
  • 1
Qtax
  • 33,241
  • 9
  • 83
  • 121
  • The question is actually related to our class, and I don't know if we can use symbols that are used in PCRE. The only symbols we used were * and + (kleene star and plus). – user1846682 Feb 05 '13 at 18:21
  • I really think that it cannot be translated into a regex. I now wonder what to answer in my homework. – user1846682 Feb 05 '13 at 18:22
  • @user1846682, your homework appears to be in formal language theory. In which case the first sentence of the answer applies. That is no, you can't make a formal regex for a non-regular language. – Qtax Feb 05 '13 at 19:33
  • `^(?:a(?=a*(b\1?)))*\1$|^$` also works in .NET, but not in JS. It seems that the backreference is used to match the previous capture in the repetition (and not all regex engine supports this)? – nhahtdh Feb 05 '13 at 22:34
  • @nhahtdh, works in Java too. Didn't know about JS (altho wouldn't expect much there). In JS it seems that the backref doesn't get redefined after the first capture. – Qtax Feb 06 '13 at 22:43
0

In formal language theory something called a "pumping lemma" can be used to prove that certain sets of sentences (languages) can not be described by a regular expression. See wikipedia http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_languages. You start from the language you want to describe and use the pumping lemma to find a contradiction. The proof for your example is actually on that wikipedia page.

A similar theory exists for context-free languages. Some languages just can not be described by context-free grammars.

Jurgen Vinju
  • 6,393
  • 1
  • 15
  • 26