Python Regex - How to Get Positions and Values of Matches

Question

How can I get the start and end positions of all matches using the re module? For example given the pattern r'[a-z]' and the string 'a1b2c3d4' I'd want to get the positions where it finds each letter. Ideally, I'd like to get the text of the match back too.

See if this helps [Match Objects](http://www.python.org/doc/2.5.2/lib/match-objects.html) — EBGreen, Oct 30 '08 at 14:08

score 189 · Accepted Answer · edited Jul 09 '19 at 11:41

189

import re
p = re.compile("[a-z]")
for m in p.finditer('a1b2c3d4'):
    print(m.start(), m.group())

edited Jul 09 '19 at 11:41

Herbert

5,279
5
44
69

answered Oct 30 '08 at 14:15

Peter Hoffmann

56,376
15
76
59

5

This doesnt provide index of other groups in a match regex=r'([a-z])(0-9)' m.start will be for group(), not group(1) – StevenWernerCS Jul 23 '19 at 15:14
1

@StevenWernerCS `start()` may accept a group number, so if you want an index of nth group, use `start(n)` – Hi-Angel Jun 06 '20 at 21:55
@hi-angel yep, see my answer below from last year that does just that – StevenWernerCS Jul 16 '20 at 01:03

score 69 · Answer 2 · edited Feb 27 '14 at 05:56

Taken from

Regular Expression HOWTO

span() returns both start and end indexes in a single tuple. Since the match method only checks if the RE matches at the start of a string, start() will always be zero. However, the search method of RegexObject instances scans through the string, so the match may not start at zero in that case.

>>> p = re.compile('[a-z]+')
>>> print p.match('::: message')
None
>>> m = p.search('::: message') ; print m
<re.MatchObject instance at 80c9650>
>>> m.group()
'message'
>>> m.span()
(4, 11)

Combine that with:

In Python 2.2, the finditer() method is also available, returning a sequence of MatchObject instances as an iterator.

>>> p = re.compile( ... )
>>> iterator = p.finditer('12 drummers drumming, 11 ... 10 ...')
>>> iterator
<callable-iterator object at 0x401833ac>
>>> for match in iterator:
...     print match.span()
...
(0, 2)
(22, 24)
(29, 31)

you should be able to do something on the order of

for match in re.finditer(r'[a-z]', 'a1b2c3d4'):
   print match.span()

You can use it like `re.search(r'abbit', "has abbit of carrot").span(0)` -- `(4, 9)` — Константин Ван, Aug 27 '17 at 11:12
The 'end index' returned by the `span()` is like the 'stop' in Python's slice notation in that it goes up to but doesn't include that index; see [here](https://stackoverflow.com/a/509295/8508004). — Wayne, Nov 20 '19 at 22:11

score 30 · Answer 3 · edited Jul 05 '17 at 13:29

30

For Python 3.x

from re import finditer
for match in finditer("pattern", "string"):
    print(match.span(), match.group())

You shall get \n separated tuples (comprising first and last indices of the match, respectively) and the match itself, for each hit in the string.

edited Jul 05 '17 at 13:29

Pedro del Sol

2,840
9
39
52

answered Jul 05 '17 at 13:08

Rams Here

301
3
3

score 17 · Answer 4 · answered Jul 23 '19 at 15:22

17

note that the span & group are indexed for multi capture groups in a regex

regex_with_3_groups=r"([a-z])([0-9]+)([A-Z])"
for match in re.finditer(regex_with_3_groups, string):
    for idx in range(0, 4):
        print(match.span(idx), match.group(idx))

answered Jul 23 '19 at 15:22

StevenWernerCS

839
9
15

1

Thanks, this has proved super useful and seems to be quite buried. Also, in case anyone needs this: when using named capture groups, one can find the index of a group using .re.groupindex, and from there find the corresponding span using the approach you outlined – inconveniently_nonexempt_bee Apr 20 '20 at 14:59
where does the `4` come from? – Radio Controlled Jul 21 '20 at 11:51
@RadioControlled number_of_known_groups_in_the_regex + 1, as range is [start,end) exclusive of end – StevenWernerCS Aug 03 '20 at 17:18
1

@StevenWernerCS so it does not generalize to cases where number of groups is not known... – Radio Controlled Aug 05 '20 at 06:02
You can generalize it to an unknown number of groups by changing `range(0, 4)` to `range(0, len( match.groups() ))`. – Chris Rudd May 17 '23 at 20:15

Python Regex - How to Get Positions and Values of Matches

4 Answers4

Linked

Related