-3

I have following regex pattern:

line_re = re.compile(r'(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2})\s+(\S+):\s+(?P<name>.*)')

I am trying to understand what the ?P<name> means. The expression works the same even when I remove it, i.e.:

line_re = re.compile(r'(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2})\s+(\S+):\s+(.*)')

I know that I can reference the matched patterns with match.group(3). What is the ?P<name> for?

Martin Vegter
  • 136
  • 9
  • 32
  • 56

1 Answers1

2

From the re module documentation:

(?P<name>...)
Similar to regular parentheses, but the substring matched by the group is accessible via the symbolic group name name. Group names must be valid Python identifiers, and each group name must be defined only once within a regular expression. A symbolic group is also a numbered group, just as if the group were not named.

So it is essentially the same as what you changed your pattern to except now you can no longer access that pattern by name as well as by its number.

To understand the difference I recommend you read up on Non-capturing And Named Groups in the Regular Expression HOWTO.

You can access named groups by passing the name to the MatchObject.group() method, or get a dictionary containing all named groups with MatchObject.groupdict(); this dictionary would not include positional groups.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343