To match nested structures, some regexp dialects provide recursive patterns like (?R)
. The (?R)
thing basically says "something that this expression matches".
Standard python re
doesn't support this, but the newer regex module, which eventually will replace re
, does. Here's a complete example.
text = """
{{some text}}
some other text
{{Infobox President
birth|d/m/y
other_inner_text:{{may contain {curly} bracket}}
other text}}
some other text
or even another infobox
{{Infobox Cabinet
same structure
{{text}}also can contain {{}}
}}
can be some other text...
"""
import regex
rx = r"""
{{ # open
( # this match
(?: # contains...
[^{}] # no brackets
| # or
}[^}] # single close bracket
| # or
{[^{] # single open bracket
| # or
(?R) # the whole expression once again <-- recursion!
)* # zero or more times
) # end of match
}} # close
"""
rx = regex.compile(rx, regex.X | regex.S)
for p in rx.findall(text):
print 'FOUND: (((', p, ')))'
Result:
FOUND: ((( some text )))
FOUND: ((( Infobox President
birth|d/m/y
other_inner_text:{{may contain {curly} bracket}}
other text )))
FOUND: ((( Infobox Cabinet
same structure
{{text}}also can contain {{}}
)))
For a great explanation of recursive regexps see this blog entry.

(couldn't resist stealing this one).
That said, you'd be probably better off with a parser-based solution. See for example parsing nested expressions with pyparsing.