-1

I am trying to capture the start and end of a capture group for each group found using the finditer() method in re.

For example:

strng = 'move 12345-!'
matches = re.finditer('move ([0-9]+).*?', strng)
for each in matches:
    print(*each.groups())
    print(each.start(), each.end())

This will yield the start and end index position, but of the matched pattern and not specifically the captured group. I essentially want to always capture the number as this will change. The word move will always be an anchor, but I don't want to include that in the position, as I need to capture the actual position of the numbers found within the text document so that I can do slicing for each number found.

Full document might be like:

move 12345-!
move 57496-!
move 96038-!
move 00528-!

And I would capture 57496 starting/ending document[17:21] where start of the 57496 is at 17 and end is at 21. The underlying positions are being used to train a model.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
dataviews
  • 2,466
  • 7
  • 31
  • 64

2 Answers2

2

If you don't want move to be part of the match, you can turn it into a positive lookbehind to assert it to the left.

Then you can use each.group() to get the match.

Note that you can omit .*? at the end of the pattern, as it is a non greedy quantifier without anything after that part and will not match any characters.

import re

strng = 'move 12345-!'
matches = re.finditer('(?<=move )[0-9]+', strng)
for each in matches:
    print(each.group())
    print(each.start(), each.end())

Output

12345
5 10
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
1
>>> import re
>>> strng = "move 12345-!"
>>> matches = re.finditer('move ([0-9]+).*?', strng)
>>> for each in matches:
    print(each.group(1))
    print(each.start(1), each.end(1))

    
12345
5 10
>>> 
AJNeufeld
  • 8,526
  • 1
  • 25
  • 44