0

I am working on trying to locate some registration numbers in some documents. The best tool for this seems to be Pythons re module. I have created a regular expression that works but I am not able to make this work when I move to a named group.

Here is the original text I am trying to extract from

    REGISTRATION NO.  874224207             PAGE 32

This regular expression works on Pythex

\s+\(?\s*REGISTRATION\s+NUMBER\)?[\.:]?\)?\s+[A-Z0-9#]{9}\s+|\s+\(?\s*REGISTRATION\s+NO\)?[\.:]?\)?\s+[A-Z0-9#]{9}\s+

But when I name my capture group theregis - that is all I want from the result I am not showing any match

\s+\(?\s*REGISTRATION\s+NUMBER\)?[\.:]?\)?\s+(?P<theregis>[A-Z0-9#]{9})\s+|\s+\(?\s*REGISTRATION\s+NO\)?[\.:]?\)?\s+(?P=theregis)\s+

Per the docs

  1. My named group is in parens
  2. I begin my group with a ?P
  3. My group has a name that is enclosed with <>

When I use my named group

  1. The group is placed in () 2 I begin with a ? and then P=
  2. The group name matched the name I gave it
  3. There are no extraneous characters in the parens where I have used the group name
  4. I tried changing the group name to something else - no luck

Finally - I used this as my model

 p = re.compile(r'\b(?P<word>\w+)\s+(?P=word)\b')
PyNEwbie
  • 4,882
  • 4
  • 38
  • 86
  • You are trying to recurse the `threregis` pattern, but `(?P=theregis)` is a *backreference* that "repeats" the *text* captured into that group. You need to use `(?&theregis)` but only with PyPi *`regex`* module that supports recursion in the same way PCRE does. – Wiktor Stribiżew Sep 16 '18 at 20:58
  • Since I don't understand your observation about me trying to recurse the pattern I guess this loses me. I am trying to not have to retype the pattern and understand how to use named groups. I am trying to do something much more complicated with a pattern and decided that learning to use named groups would be helpful to my goals. I made this example because I was not successful and decided to simplify things as much as possible – PyNEwbie Sep 16 '18 at 21:04
  • 1
    I understand what you are trying to do, and it is exactly what you can't do with `re`, but you can do it with `regex`, as [shown here](https://regex101.com/r/ElyNfD/1). Or build the pattern dynamically. – Wiktor Stribiżew Sep 16 '18 at 21:14
  • Are you saying that the named group can't be used on the other side of the pipe? Sorry for such a simplistic question – PyNEwbie Sep 16 '18 at 21:56
  • No, I said you cannot re-use the named capturing group pattern by means of a named group backreference. There are other ways for it, even when using `re`, see the linked thread. – Wiktor Stribiżew Sep 16 '18 at 22:06

0 Answers0