I am trying to capture the start and end of a capture group for each group found using the finditer()
method in re
.
For example:
strng = 'move 12345-!'
matches = re.finditer('move ([0-9]+).*?', strng)
for each in matches:
print(*each.groups())
print(each.start(), each.end())
This will yield the start and end index position, but of the matched pattern and not specifically the captured group. I essentially want to always capture the number as this will change. The word move
will always be an anchor, but I don't want to include that in the position, as I need to capture the actual position of the numbers found within the text document so that I can do slicing for each number found.
Full document might be like:
move 12345-!
move 57496-!
move 96038-!
move 00528-!
And I would capture 57496
starting/ending document[17:21]
where start of the 57496 is at 17 and end is at 21. The underlying positions are being used to train a model.