2

If I have 1) /foo|oo/ 2) /oo|foo/ and using PCRE and I match it against the string "foo" the expected result is

  1. 1) foo 2) oo. PCRE keeps "OR" order.
  2. foo. PCRE tries all variants and goes for longest match.
  3. There is no preset rule, the optimizer might reorder as it sees fit. It is the duty of the developer to avoid ambiguous scenarios like this.
  4. There is a rule but it's not 2.

"Try it and see" seems to kill 1.) but there is no way to determine between 2-3-4 just by trial and error.

chx
  • 11,270
  • 7
  • 55
  • 129

1 Answers1

2

4) Get the match closest to the start of the string. When multiple matches are possible from the current position, match the option that matches sooner.

e.g.

banana matching against /na/ (showing the match with uppercase): baNAna (sooner than banaNA). Against /an|b/, matches Banana (sooner than bANana). Against /ba|./, matches BAnana (same position, so ba matches before .). Against /.|ba/, matches Banana (same position, so . matches before ba).

Amadan
  • 191,408
  • 23
  • 240
  • 301
  • What about `/fo|foo/` and `/foo|fo/`. A quick test shows the regexp order matters in this case. Is that always so then? I am not even a beginner but regexp OR befuddles me. Curiously I have no problems with other regexp features but OR is confusing. – chx Feb 03 '16 at 02:02
  • It does matter: since they both match at the same position, the first one is matched: `fo` in the first case, `foo` in the second one. – Amadan Feb 03 '16 at 02:09
  • Last clarification question: is this a hard and fast rule for all regexp engines or this might vary? – chx Feb 03 '16 at 02:13