How can I get the start and end positions of all matches using the re
module? For example given the pattern r'[a-z]'
and the string 'a1b2c3d4'
I'd want to get the positions where it finds each letter. Ideally, I'd like to get the text of the match back too.
Asked
Active
Viewed 1.9e+01k times
164

Wiktor Stribiżew
- 607,720
- 39
- 448
- 563

Greg
- 45,306
- 89
- 231
- 297
-
See if this helps [Match Objects](http://www.python.org/doc/2.5.2/lib/match-objects.html) – EBGreen Oct 30 '08 at 14:08
4 Answers
189
import re
p = re.compile("[a-z]")
for m in p.finditer('a1b2c3d4'):
print(m.start(), m.group())

Herbert
- 5,279
- 5
- 44
- 69

Peter Hoffmann
- 56,376
- 15
- 76
- 59
-
5This doesnt provide index of other groups in a match regex=r'([a-z])(0-9)' m.start will be for group(), not group(1) – StevenWernerCS Jul 23 '19 at 15:14
-
1@StevenWernerCS `start()` may accept a group number, so if you want an index of nth group, use `start(n)` – Hi-Angel Jun 06 '20 at 21:55
-
@hi-angel yep, see my answer below from last year that does just that – StevenWernerCS Jul 16 '20 at 01:03
69
Taken from
span() returns both start and end indexes in a single tuple. Since the match method only checks if the RE matches at the start of a string, start() will always be zero. However, the search method of RegexObject instances scans through the string, so the match may not start at zero in that case.
>>> p = re.compile('[a-z]+')
>>> print p.match('::: message')
None
>>> m = p.search('::: message') ; print m
<re.MatchObject instance at 80c9650>
>>> m.group()
'message'
>>> m.span()
(4, 11)
Combine that with:
In Python 2.2, the finditer() method is also available, returning a sequence of MatchObject instances as an iterator.
>>> p = re.compile( ... )
>>> iterator = p.finditer('12 drummers drumming, 11 ... 10 ...')
>>> iterator
<callable-iterator object at 0x401833ac>
>>> for match in iterator:
... print match.span()
...
(0, 2)
(22, 24)
(29, 31)
you should be able to do something on the order of
for match in re.finditer(r'[a-z]', 'a1b2c3d4'):
print match.span()

Honest Abe
- 8,430
- 4
- 49
- 64

gone
- 4,342
- 2
- 24
- 29
-
You can use it like `re.search(r'abbit', "has abbit of carrot").span(0)` -- `(4, 9)` – Константин Ван Aug 27 '17 at 11:12
-
The 'end index' returned by the `span()` is like the 'stop' in Python's slice notation in that it goes up to but doesn't include that index; see [here](https://stackoverflow.com/a/509295/8508004). – Wayne Nov 20 '19 at 22:11
30
For Python 3.x
from re import finditer
for match in finditer("pattern", "string"):
print(match.span(), match.group())
You shall get \n
separated tuples (comprising first and last indices of the match, respectively) and the match itself, for each hit in the string.

Pedro del Sol
- 2,840
- 9
- 39
- 52

Rams Here
- 301
- 3
- 3
17
note that the span & group are indexed for multi capture groups in a regex
regex_with_3_groups=r"([a-z])([0-9]+)([A-Z])"
for match in re.finditer(regex_with_3_groups, string):
for idx in range(0, 4):
print(match.span(idx), match.group(idx))

StevenWernerCS
- 839
- 9
- 15
-
1Thanks, this has proved super useful and seems to be quite buried. Also, in case anyone needs this: when using named capture groups, one can find the index of a group using
.re.groupindex, and from there find the corresponding span using the approach you outlined – inconveniently_nonexempt_bee Apr 20 '20 at 14:59 -
-
@RadioControlled number_of_known_groups_in_the_regex + 1, as range is [start,end) exclusive of end – StevenWernerCS Aug 03 '20 at 17:18
-
1@StevenWernerCS so it does not generalize to cases where number of groups is not known... – Radio Controlled Aug 05 '20 at 06:02
-
You can generalize it to an unknown number of groups by changing `range(0, 4)` to `range(0, len( match.groups() ))`. – Chris Rudd May 17 '23 at 20:15