1

Note: I know none of this is supported in the existing re module, I am using the newer regex module intended to replace re in the future.

I need to build some complex regular expressions, but I would also like those expressions to be maintainable. I don't want anyone to come back to this code months later and have to spend days unravelling or re-writing the expression, myself included. :P

There is some PCRE syntax that I've previously used to accomplish this, eg:

/
(?(DEFINE)
  (?<userpart> thomas | richard | harold )
  (?<domainpart> gmail | yahoo | hotmail )
  (?<tld> com | net | co\.uk )
  (?<email> (?&userpart)@(?&domainpart)\.(?&tld) )
)
^ To: \s+ .* \s+ < (?&email) > $
/ix

Will match the line: To: Tom Selleck <thomas@gmail.com>

Note²: I'm not trying to match email addresses, it's just an example.

I see that the regex module has implemented recursive patterns, and named recursive patterns, but it does not seem to like (?(DEFINE) ... ) syntax, giving the error unknown group at position 10.

Is it at all possible to pre-define named patterns like this in Python?

Sammitch
  • 30,782
  • 7
  • 50
  • 77
  • 1
    did you try to write for example: `(? com | net | co\.uk ){0}` – Casimir et Hippolyte Apr 04 '14 at 19:26
  • Recursion is different from a define construct. Also, it is not mentioned in the [documentation](https://pypi.python.org/pypi/regex) that the regex module supports it. – Jerry Apr 04 '14 at 19:26
  • I don't see anything like that in the docs for regex, so I think the answer is no. Am I right that you can still achieve the effect that you want in terms of match behavior, you just want to write the regex in a more readable way? – BrenBarn Apr 04 '14 at 19:26
  • @CasimiretHippolyte you win the super. Removed the `(?(DEFINE) ...)` block, added `{0}` to the ends of the patterns, and it worked! If you want to formalize that into an answer I'll gladly accept. – Sammitch Apr 04 '14 at 19:29
  • Youpi!!! <°))))))))))> – Casimir et Hippolyte Apr 04 '14 at 19:30

1 Answers1

5

Since there is no syntax like the Perl/PCRE (?(DEFINE)....) in the new python regex module, you can use this trick (I think that it works in Ruby too):

import regex

pattern = r'''
  (?<userpart> thomas | richard | harold ){0}
  (?<domainpart> gmail | yahoo | hotmail ){0}
  (?<tld> com | net | co\.uk ){0}
  (?<email> (?&userpart)@(?&domainpart)\.(?&tld) ){0}

  ^ To: \s+ .* \s+ < (?&email) > $
'''

Since you add the quantifier {0}, you obtain zero width group definitions you can put everywhere.

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125