1

I wrote a regex match pattern in python, but re.match() do not capture groups after | alternation operator.

Here is the pattern:

pattern = r"00([1-9]\d) ([1-9]\d) ([1-9]\d{5})|\+([1-9]\d) ([1-9]\d) ([1-9]\d{5})"

I feed the pattern with a qualified string: "+12 34 567890":

strng = "+12 34 567890"
pattern = r"00([1-9]\d) ([1-9]\d) ([1-9]\d{5})|\+([1-9]\d) ([1-9]\d) ([1-9]\d{5})"
m = re.match(pattern, strng)
print(m.group(1))

None is printed.

Buf if I delete the part before | alternation operator

strng = "+12 34 567890"

pattern = r"\+([1-9]\d) ([1-9]\d) ([1-9]\d{5})"
m = re.match(pattern, strng)
print(m.group(1))

It can capture all 3 groups:

12
34
567890

Thanks so much for your thoughts!

Elle
  • 15
  • 4

2 Answers2

1

'|' has nothing to do with the index of group, index is always counted from left to right in the regex itself.

In your original regex, their are 6 groups:

In [270]: m.groups()
Out[270]: (None, None, None, '12', '34', '567890')

The matching part is the second part, thus you need:

In [271]: m.group(4)
Out[271]: '12'
llllllllll
  • 16,169
  • 4
  • 31
  • 54
0

You want to support two different patterns, one with 00 and the other with + at the start. You may merge the alternatives using a non-capturing group:

import re
strng = "+12 34 567890"
pattern = r"(?:00|\+)([1-9]\d) ([1-9]\d) ([1-9]\d{5})$"
m = re.match(pattern, strng)
if m:
    print(m.group(1))
    print(m.group(2))
    print(m.group(3))

See the regex demo and the Python demo yielding

12
34
567890

The regex at the regex testing site is prepended with ^ (start of string) because re.match only matches at the start of the string. The whole pattern now matches:

  • ^ - start of string (implicit in re.match)
  • (?:00|\+) - a 00 or + substrings
  • ([1-9]\d) - Capturing group 1: a digit from 1 to 9 and then any digit
  • - a space (replace with \s to match any 1 whitespace chars)
  • ([1-9]\d) - Capturing group 2: a digit from 1 to 9 and then any digit
  • - a space (replace with \s to match any 1 whitespace chars)
  • ([1-9]\d{5}) - Capturing group 3: a digit from 1 to 9 and then any 5 digits
  • $ - end of string.

Remove $ if you do not need to match the end of the string right after the number.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    Thank you so much for your detailed explanation! I learnt a new skill (non-capturing group) in your post. – Elle May 22 '18 at 02:18
  • @Elle Are you sure you do not want the whole string to be matched with your regex? If you do, then my answer is the correct one. – Wiktor Stribiżew May 22 '18 at 06:26