8

I am reading the code of a regular expression parser, and start to wonder if the syntax of regular expression is itself regular, and can be expressed with another (quite complicated) regular expression?

rere = "" # the regular expression of regular language
match1 = re.match(rere, "[a-z]+@[a-z]+.com") # True
match2 = re.match(rere, ")az[") # False 

I don't see any recursive structure in regular expression syntax, so I think maybe this is doable?

If it is, what does the expression look like? If not, why?

NeoWang
  • 17,361
  • 24
  • 78
  • 126
  • 3
    No. You need context-free grammar to parse regular expression. Nested parentheses can't be parsed with (theoretical) regular expression. – nhahtdh Sep 03 '15 at 06:54
  • Yes, nested parentheses. I forgot about that. But if I don't support group inside group, would the answer be different? – NeoWang Sep 03 '15 at 06:58
  • 1
    @NeoWang: Then what you have is weaker than regular expression. i.e. there are languages where regular expression/regular grammar can described, but not your grammar. – nhahtdh Sep 03 '15 at 07:00
  • Actually, you can match nested parentheses with regex, but only in some regex flavors. Your example code is Python, and its regex engine does not support [recursive behavior](http://www.regular-expressions.info/subroutine.html#balanced)/[balanced constructs](http://www.regular-expressions.info/balancing.html). However, there is no magical regex that will "parse them all". – Wiktor Stribiżew Sep 03 '15 at 07:08
  • @stribizhev: Those flavor are not strictly "regular" in the theoretical sense, but if the question specifically refers to real world "regex" engines, then I guess it's possible for some flavors. – nhahtdh Sep 03 '15 at 07:31
  • This may be off topic... Why are most regex parsers hand written? Why not write its context free grammar and use a parser generator? – NeoWang Sep 03 '15 at 08:06

1 Answers1

5

You cannot parse nested parentheses with a regular expression because you would need infinite state to do so. So the answer is no. What you are looking for is called context-free grammars.

Emil Vikström
  • 90,431
  • 16
  • 141
  • 175