0

I have the following complete code example

import re

examples = [
    "D1",       # expected: ('1')
    "D1sjdgf",  # ('1')
    "D1.2",     # ('1', '2')
    "D1.2.3",   # ('1', '2', '3')
    "D3.10.3x", # ('3', '10', '3')
    "D3.10.11"  # ('3', '10', '11')
]

for s in examples:
    result = re.search(r'^D(\d+)(?:\.(\d+)(?:\.(\d+)))', s)
    print(s, result.groups())

where I want to match the 1, 2 or 3 numbers in the expression always starting with the letter "D". It could be 1 of them, or 2, or three. I am not interested in anything after the last digit.

I would expect that my regex would match e.g. D3.10.3x and return ('3','10','3'), but instead returns only ('3',). I do not understand why.

^D(\d+\)(?:\.(\d+)(?:\.(\d+)))

  • ^D matches "D" at the start
  • \d matches the first one-digit number inside a group.
  • (?: starts a non-matching group. I do not want to get this group back.
  • \. A literal point
  • (\d+) A group of one or more numbers I want to "catch"

I also do not know what a "non-capturing" group means in that context as for this answer.

Alex
  • 41,580
  • 88
  • 260
  • 469

1 Answers1

1

You may use this regex solution with a start anchor and 2 capture groups inside the nested optional capture groups:

^D(\d+)(?:\.(\d+)(?:\.(\d+))?)?

RegEx Demo

Explanation:

  • ^: Start
  • D: Match letter D
  • (\d+): Match 1+ digits in capture group #1
  • (?:: Start outer non-capture group
    • \.: Match a dot
    • (\d+): Match 1+ digits in capture group #2
    • (?:: Start inner non-capture group
      • \.: Match a dot
      • (\d+): Match 1+ digits in capture group #3
    • )?: End inner optional non-capture group
  • )?: End outer optional non-capture group

Code Demo:

import re

examples = [
    "D1",       # expected: ('1')
    "D1sjdgf",  # ('1')
    "D1.2",     # ('1', '2')
    "D1.2.3",   # ('1', '2', '3')
    "D3.10.3x", # ('3', '10', '3')
    "D3.10.11"  # ('3', '10', '11')
]

rx = re.compile(r'^D(\d+)(?:\.(\d+)(?:\.(\d+))?)?')

for s in examples:
    result = rx.search(s)
    print(s, result.groups())

Output:

D1 ('1', None, None)
D1sjdgf ('1', None, None)
D1.2 ('1', '2', None)
D1.2.3 ('1', '2', '3')
D3.10.3x ('3', '10', '3')
D3.10.11 ('3', '10', '11')
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    Sorry, I used your suggestion in some other regex. Seems to work. And maybe I even understand it a bit how it works. Thanks! – Alex Aug 11 '22 at 15:34