0

As per Documentation here:

'|'

A|B, where A and B can be arbitrary REs, creates a regular expression that will match either A or B. An arbitrary number of REs can be separated by the '|' in this way. This can be used inside groups (see below) as well. As the target string is scanned, REs separated by '|' are tried from left to right. When one pattern completely matches, that branch is accepted. This means that once A matches, B will not be tested further, even if it would produce a longer overall match. In other words, the '|' operator is never greedy. To match a literal '|', use \|, or enclose it inside a character class, as in [|].

So I tried this and got expected result. My regex A(a.*b) found match, so regex B(b.*e) is not in result(even though it will produce a longer match).

>> re.findall('a.*b|b.*e', 'abcdef')
>> ['ab']

But then, how does this work? Even though regex A(b.*d) can find match, result of regex B(a.*e) is in the output.

>> re.findall('b.*d|a.*e', 'abcdef')
>> ['abcde']

Here I am not trying to parse any string with regex but just trying to understand how regex works. So there is no expected output or sample output.

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
Kamal
  • 2,384
  • 1
  • 13
  • 25
  • It starts from the beginning of your string `'abcde'`, and finds a match with `a.*e` at the start. – khelwood Apr 02 '19 at 08:58
  • 2
    `a` is earlier than `b`, so it matched from `a`. You should bear in mind that the *input string* is also analyzed from left to right (by default), not just the pattern (the pattern is always parsed from left to right). – Wiktor Stribiżew Apr 02 '19 at 08:58
  • @WiktorStribiżew, so the higher precedence is to the pattern which has match *starting* earlier in input string and not *ending* or *full match*. So even if `b.*d` can be matched in just first 4 characters of input string, `a.*e` will get higher precedence because start of match is earlier compared to other? – Kamal Apr 02 '19 at 09:08
  • 1
    Matches are searched for from left to right by default (and that is the only way `re` does it, not `regex`). If a match started and then consumed some chars, those chars, upon a successful match, cannot be re-tested, and thus you may miss the matches you expected (but then comes the question of overlapping matches, also [a very common one](https://stackoverflow.com/questions/5616822/)). – Wiktor Stribiżew Apr 02 '19 at 09:11
  • Ok, thanks for explanation and related answers. – Kamal Apr 02 '19 at 09:17

0 Answers0