Python Regex Or '|' operator internal working

Question

As per Documentation here:

'|'

A|B, where A and B can be arbitrary REs, creates a regular expression that will match either A or B. An arbitrary number of REs can be separated by the '|' in this way. This can be used inside groups (see below) as well. As the target string is scanned, REs separated by '|' are tried from left to right. When one pattern completely matches, that branch is accepted. This means that once A matches, B will not be tested further, even if it would produce a longer overall match. In other words, the '|' operator is never greedy. To match a literal '|', use \|, or enclose it inside a character class, as in [|].

So I tried this and got expected result. My regex A(a.*b) found match, so regex B(b.*e) is not in result(even though it will produce a longer match).

>> re.findall('a.*b|b.*e', 'abcdef')
>> ['ab']

But then, how does this work? Even though regex A(b.*d) can find match, result of regex B(a.*e) is in the output.

>> re.findall('b.*d|a.*e', 'abcdef')
>> ['abcde']

Here I am not trying to parse any string with regex but just trying to understand how regex works. So there is no expected output or sample output.

It starts from the beginning of your string `'abcde'`, and finds a match with `a.*e` at the start. — khelwood, Apr 02 '19 at 08:58
`a` is earlier than `b`, so it matched from `a`. You should bear in mind that the *input string* is also analyzed from left to right (by default), not just the pattern (the pattern is always parsed from left to right). — Wiktor Stribiżew, Apr 02 '19 at 08:58
@WiktorStribiżew, so the higher precedence is to the pattern which has match *starting* earlier in input string and not *ending* or *full match*. So even if `b.*d` can be matched in just first 4 characters of input string, `a.*e` will get higher precedence because start of match is earlier compared to other? — Kamal, Apr 02 '19 at 09:08
Matches are searched for from left to right by default (and that is the only way `re` does it, not `regex`). If a match started and then consumed some chars, those chars, upon a successful match, cannot be re-tested, and thus you may miss the matches you expected (but then comes the question of overlapping matches, also [a very common one](https://stackoverflow.com/questions/5616822/)). — Wiktor Stribiżew, Apr 02 '19 at 09:11

Python Regex Or '|' operator internal working

0 Answers0