The following regex a[bcd]*b
matches the longest substring (because *
is greedy):
a
starting with a
[bcd]*
followed by any number (0: can match empty string) of character in set (b,c,d)
b
ending by b
EDIT: following comment, backtracking occurs in following example
>>> re.findall(r2,"abcxb")
['ab']
abc
matches a[bcd]*
, but x
is not expected
a
also matches a[bcd]*
(because empty string matches [bcd]*)
- finally returns
ab
Concerning greediness, the metacharacter *
after a single character, a character set or a group, means any number of times (the most possible match) some regexp engines accept the sequence of metacharacters *?
which modifies the behavior to the least possible, for example:
>>> r2 = r'a[bcd]*?b'
>>> re.findall(r2,"abcbde")
['ab']