python finditer get start end of capture group

Question

I am trying to capture the start and end of a capture group for each group found using the finditer() method in re.

For example:

strng = 'move 12345-!'
matches = re.finditer('move ([0-9]+).*?', strng)
for each in matches:
    print(*each.groups())
    print(each.start(), each.end())

This will yield the start and end index position, but of the matched pattern and not specifically the captured group. I essentially want to always capture the number as this will change. The word move will always be an anchor, but I don't want to include that in the position, as I need to capture the actual position of the numbers found within the text document so that I can do slicing for each number found.

Full document might be like:

move 12345-!
move 57496-!
move 96038-!
move 00528-!

And I would capture 57496 starting/ending document[17:21] where start of the 57496 is at 17 and end is at 21. The underlying positions are being used to train a model.

So use `start(1)` and `end(1)`? – no comment Sep 28 '21 at 22:40 — no comment, Sep 28 '21 at 22:40

score 2 · Accepted Answer · answered Sep 28 '21 at 22:49

If you don't want move to be part of the match, you can turn it into a positive lookbehind to assert it to the left.

Then you can use each.group() to get the match.

Note that you can omit .*? at the end of the pattern, as it is a non greedy quantifier without anything after that part and will not match any characters.

import re

strng = 'move 12345-!'
matches = re.finditer('(?<=move )[0-9]+', strng)
for each in matches:
    print(each.group())
    print(each.start(), each.end())

Output

12345
5 10

score 1 · Answer 2 · answered Sep 28 '21 at 22:41

1

>>> import re
>>> strng = "move 12345-!"
>>> matches = re.finditer('move ([0-9]+).*?', strng)
>>> for each in matches:
    print(each.group(1))
    print(each.start(1), each.end(1))

    
12345
5 10
>>>

answered Sep 28 '21 at 22:41

AJNeufeld

8,526
1
25
44

python finditer get start end of capture group

2 Answers2