As the documentation stated, using regex.search(string, pos, endpos)
is not completely equivalent to slicing the string, i.e. regex.search(string[pos:endpos])
. It won't do regex matching as if the string is starting from pos
, so ^
does not match the beginning of the substring, but only matches the real beginning of the whole string. However, $
matches either the end of the substring or the whole string.
>>> re.compile('^am').findall('I am falling in code', 2, 12)
[] # am is not at the beginning
>>> re.compile('^am').findall('I am falling in code'[2:12])
['am'] # am is the beginning
>>> re.compile('ing$').findall('I am falling in code', 2, 12)
['ing'] # ing is the ending
>>> re.compile('ing$').findall('I am falling in code'[2:12])
['ing'] # ing is the ending
>>> re.compile('(?<= )am').findall('I am falling in code', 2, 12)
['am'] # before am there is a space
>>> re.compile('(?<= )am').findall('I am falling in code'[2:12])
[] # before am there is no space
>>> re.compile('ing(?= )').findall('I am falling in code', 2, 12)
[] # after ing there is no space
>>> re.compile('ing(?= )').findall('I am falling in code'[2:12])
[] # after ing there is no space
>>> re.compile(r'\bm.....').findall('I am falling in code', 3, 11)
[]
>>> re.compile(r'\bm.....').findall('I am falling in code'[3:11])
['m fall']
>>> re.compile(r'.....n\b').findall('I am falling in code', 3, 11)
['fallin']
>>> re.compile(r'.....n\b').findall('I am falling in code'[3:11])
['fallin']
My questions are... Why is it not consistent between beginning and ending match? Why does using pos
and endpos
treat the end as the real end, but the start/beginning is not treated as the real start/beginning?
Is there any approach to make using pos
and endpos
imitate slicing? Because Python copies string when slicing instead of just reference the old one, it would be more efficient to use pos
and endpos
instead of slicing when working with big string multiple times.