15

I'm interested how can be implemented recursive regexp matching in Python (I've not found any examples :( ). For example how would one write expression which matches "bracket balanced" string like "foo(bar(bar(foo)))(foo1)bar1"

ThomasH
  • 22,276
  • 13
  • 61
  • 62
giolekva
  • 1,158
  • 2
  • 11
  • 23

3 Answers3

15

You could use pyparsing

#!/usr/bin/env python
from pyparsing import nestedExpr
import sys
astring=sys.argv[1]
if not astring.startswith('('):
    astring='('+astring+')'

expr = nestedExpr('(', ')')
result=expr.parseString(astring).asList()[0]
print(result)

Running it yields:

% test.py "foo(bar(bar(foo)))(foo1)bar1"
['foo', ['bar', ['bar', ['foo']]], ['foo1'], 'bar1']
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
4

You can't do it with a regexp. Python doesn't support recursive regexp

John La Rooy
  • 295,403
  • 53
  • 369
  • 502
2

With PyPi regex, that you can easily install using pip install regex, you may use

import regex

pattern = r'[^()]*+(\((?>[^()]|(?1))*+\)[^()]*+)++'
text = 'foo(bar(bar(foo)))(foo1)bar1'
print(bool(regex.fullmatch(pattern, text)))
# => True

See the Python demo, see the regex pattern demo (note the \A and \z anchors are added in the demo because regex.fullmatch requires a full string match).

Pattern details

  • \A - implicit in regex.fullmatch - start of string
  • [^()]*+ - 0 or more chars other than ( and ) (possessive match, no backtracking into the pattern allowed)
  • (\((?>[^()]|(?1))*+\)[^()]*+)++ - 1+ occurrences of Group 1 pattern:
    • \( - a ( char
    • (?>[^()]|(?1))*+ - 1+ repetitions (possessive match) of
      • [^()] - any char but ( and )
      • | - or
      • (?1) - a regex subroutine that recurses Group 1 pattern
    • \) - a ) char
    • [^()]*+ - 0 or more chars other than ( and ) (possessive match)
  • \z - implicit in regex.fullmatch - end of string.

See the pattern and more information on regex subroutines at regular-expressions.info.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563