164

How can I get the start and end positions of all matches using the re module? For example given the pattern r'[a-z]' and the string 'a1b2c3d4' I'd want to get the positions where it finds each letter. Ideally, I'd like to get the text of the match back too.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Greg
  • 45,306
  • 89
  • 231
  • 297

4 Answers4

189
import re
p = re.compile("[a-z]")
for m in p.finditer('a1b2c3d4'):
    print(m.start(), m.group())
Herbert
  • 5,279
  • 5
  • 44
  • 69
Peter Hoffmann
  • 56,376
  • 15
  • 76
  • 59
69

Taken from

Regular Expression HOWTO

span() returns both start and end indexes in a single tuple. Since the match method only checks if the RE matches at the start of a string, start() will always be zero. However, the search method of RegexObject instances scans through the string, so the match may not start at zero in that case.

>>> p = re.compile('[a-z]+')
>>> print p.match('::: message')
None
>>> m = p.search('::: message') ; print m
<re.MatchObject instance at 80c9650>
>>> m.group()
'message'
>>> m.span()
(4, 11)

Combine that with:

In Python 2.2, the finditer() method is also available, returning a sequence of MatchObject instances as an iterator.

>>> p = re.compile( ... )
>>> iterator = p.finditer('12 drummers drumming, 11 ... 10 ...')
>>> iterator
<callable-iterator object at 0x401833ac>
>>> for match in iterator:
...     print match.span()
...
(0, 2)
(22, 24)
(29, 31)

you should be able to do something on the order of

for match in re.finditer(r'[a-z]', 'a1b2c3d4'):
   print match.span()
Honest Abe
  • 8,430
  • 4
  • 49
  • 64
gone
  • 4,342
  • 2
  • 24
  • 29
  • You can use it like `re.search(r'abbit', "has abbit of carrot").span(0)` -- `(4, 9)` – Константин Ван Aug 27 '17 at 11:12
  • The 'end index' returned by the `span()` is like the 'stop' in Python's slice notation in that it goes up to but doesn't include that index; see [here](https://stackoverflow.com/a/509295/8508004). – Wayne Nov 20 '19 at 22:11
30

For Python 3.x

from re import finditer
for match in finditer("pattern", "string"):
    print(match.span(), match.group())

You shall get \n separated tuples (comprising first and last indices of the match, respectively) and the match itself, for each hit in the string.

Pedro del Sol
  • 2,840
  • 9
  • 39
  • 52
Rams Here
  • 301
  • 3
  • 3
17

note that the span & group are indexed for multi capture groups in a regex

regex_with_3_groups=r"([a-z])([0-9]+)([A-Z])"
for match in re.finditer(regex_with_3_groups, string):
    for idx in range(0, 4):
        print(match.span(idx), match.group(idx))
StevenWernerCS
  • 839
  • 9
  • 15