I wasn't sure if your original regex would give you what you wanted.
So sorry if I'm late to the party. But others may find this useful too.
import re
p = r"AAAA(?=\s\w+)" #revised per comment from @Jerry
p2 =r"\w+ AAAA \w+"
s = "foo bar AAAA foo2 AAAA bar2"
l = re.findall(p, s)
l2 = re.findall(p2, s)
print('l: {l}'.format(l=l))
#print(f'l: {l}') is nicer, but online interpreters sometimes don't support it.
# https://www.onlinegdb.com/online_python_interpreter
#I'm using Python 3.
print('l2: {l}'.format(l=l2))
for m in re.finditer(p, s):
print(m.span())
#A span of (n,m) would really represent characters n to m-1 with zero based index
#So.(8,12):
# => (8,11: 0 based index)
# => (9th to 12th characters conventional 1 based index)
print(re.findall(p, s)[-1])
Outputs:
l: ['AAAA', 'AAAA']
l2: ['bar AAAA foo2']
(8, 12)
(18, 22)
AAAA
The reason you get two results here instead of one in the original is the (?=)
special sauce.
It's called a positive lookahead.
It does not 'consume' (i.e. advance the cursor), when the match is found during the regex evaluation. So, it comes back after matching.
Although positive lookaheads are in parenthesis, they also act as a non-capture group.
So, although a pattern is matched, the results omit the surrounding sequence of alphanumeric characters represented by the \w+
and the intervening spaces, \s
in my example -- representing [ \t\n\r\f\v]
. (More here)
So I only get back AAAA each time.
p2
here, represents the original pattern of the code of @SDD, the person posing the question.
foo2
is consumed with that pattern, so the second AAAA would not match, as the cursor had advanced too far, when the regex engine recommences on its second iteration of matching.
I recommend taking a look at Moondra's Youtube videos if you want to dig in deeper.
He has done a very thorough 17 part series on Python regexes, beginning here
Here's a link to an online Python Interpreter.