3

This question and its answers here suggest that both:

  1. It is possible to use regex to match nested patterns.

  2. And that it is not possible, because nested patterns are not regular languages and therefore DFAs (which regex are) cannot recognize them.

Before reading an answer (https://stackoverflow.com/a/3851098/2876289) to the above question, I would always side with 2. Now however I'm not so sure.

Can '/(\((?>[^()]+|(?1))*\))/'

really match nested patterns?

That being asked - I tried the above in vim (and JavaScript) and it doesn't work. Though perhaps it needs to be converted to a different syntax. The answer quoting it has 9 upvotes.

Community
  • 1
  • 1
Cris
  • 441
  • 3
  • 11

1 Answers1

4

You must make the difference between three things:

  1. implementations of regex that have features to deal with nested things.
  2. implementations of regex that don't have these features
  3. "regular expression" in a theorical meaning

case 1: I know two features that can deal with nested parenthesis (or other things)

  • the capability to refer to the subpattern of a capture group, it is the case of your example:

    (\((?>[^()]+|(?1))*\))

(?1) refers to the capture group 1 (inside itself to obtain a recursion)

This feature is available in the PCRE regex engine (PHP, R,), in Ruby (with oniguruma syntax), in Perl, in the new regex module of Python, XRegExp library for javascript, libboost...

  • a stack system, like in .net:

    (?:[^()]|(?<Open>[(])|(?<-Open>[)]))*(?(Open)(?!))

(take a look at this excellent post for more informations.)

case 2: implementations that don't have these features: Javascript, the re module of Python, Java...

case 3: In a theorical meaning "regular expression" is a description of a regular language. Since an undetermined level of nesting is not a regular language, it can't be describe with a "regular expression". However, the important point is that this acceptance of the term 'regular expression' has not much to do with the term 'regex' (or 'regular expression') as it is commonly used, that is a misuse of language.

Community
  • 1
  • 1
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125