0

I want to print first, second and third matched group in an expression. Here is the details.

Regex Pattern = "(\d+)"
Expression = "1123-xxx-abcd-45-tsvt-35-pwrst-99-xql"

I used Pythex, https://pythex.org/?regex=(%5Cd%2B)&test_string=1123-xxx-abcd-45-tsvt-35-pwrst-99-xql&ignorecase=0&multiline=0&dotall=0&verbose=0 It is working perfectly find and it displays all the captured groups.

But it is not working in python code. I provide below the python code, I am unable to find the problem.

import re


class Test3:

    def printAllGroups(self):
        regexPattern = r"(\d+)"
        text = "1123-xxx-abcd-45-tsvt-35-pwrst-99-xql"
        matcher = re.compile(regexPattern, flags=re.IGNORECASE)
        matchValue = matcher.match(text);
        if matchValue:
            print("First group : ", matchValue.group(1))
            print("Second group : ", matchValue.group(2))
            print("Third group : ", matchValue.group(2))


if __name__ == '__main__':
    test3 = Test3()
    test3.printAllGroups()

Please help me solve this problem, I am new to Python.

Allan
  • 12,117
  • 3
  • 27
  • 51
Sambit
  • 7,625
  • 7
  • 34
  • 65

1 Answers1

1

code:

import re

regexPattern = r"(\d+)"
expression = "1123-xxx-abcd-45-tsvt-35-pwrst-99-xql"
print(re.findall(regexPattern,expression))

Output:

['1123', '45', '35', '99']

In your current code you will have the error:

    print("Second group : ", matchValue.group(2))
IndexError: no such group

because there is only one group in the regex.

By changing your code in the following way, with the regex explained at https://regex101.com/r/BjTrgU/2, you will have one single match (the whole line) and four groups, that you can access individually, to extract the numbers.

It is important to distinguish the difference between a match (when your regex matches/verifies an input string) and a the value stores in the groups of your regex each defined by parenthesis ()

1st occurrence of () in the regex will be accessible via backreference \1 in the regex (or the replacement string) or group(1) outside of the regex, 2nd occurrence of () in the regex will be accessible via backerefence \2 in the regex (or the replacement string) or group(2) outside of the regex,...

import re

class Test3:

    def printAllGroups(self):
        regexPattern = r"^(\d+)[^\d]+(\d+)[^\d]+(\d+)[^\d]+(\d+)[^\d]+$"
        text = "1123-xxx-abcd-45-tsvt-35-pwrst-99-xql"
        matcher = re.compile(regexPattern, flags=re.IGNORECASE)
        matchValue = matcher.match(text);
        if matchValue:
            print("First group : ", matchValue.group(1))
            print("Second group : ", matchValue.group(2))
            print("Third group : ", matchValue.group(3))
            print("Third group : ", matchValue.group(4))

if __name__ == '__main__':
    test3 = Test3()
    test3.printAllGroups()

Output:

python test3.py
('First group : ', '1123')
('Second group : ', '45')
('Third group : ', '35')
('Third group : ', '99')
Allan
  • 12,117
  • 3
  • 27
  • 51