I'm trying to find all the occurances of a substring inside a string and print their start and end index using regular expression.
For example, string = 'bbbcbb' sub = 'bb' I must get (0,1) (1,2) (4,5) as my output.
My code:
import re
matches = list(re.finditer(r'bb(?=[a-zA-Z]|$)', 'bbbcbb'))
The output:
[<_sre.SRE_Match object; span=(0, 2), match='bb'>,
<_sre.SRE_Match object;span=(4, 6), match='bb'>]
I went through the documentation on https://docs.python.org/3/library/re.html and to my understanding the lookahead assertion will work by
- At postion 0, it will match 'bb' with "bb" followed by "b" .i.e. bbbcbb
- At postion 1, it will match 'bb' with "bb" followed by "c" .i.e. bbbcbb
- Then it will not match till postion 4 where it will match 'bb' with "bb" followed by $ .i.e. bbbcbb
Why is the lookahead assertion ignoring the b'bb'cbb at the (1,3) position? Or is my understanding of the lookahead assertion flawed?