11

How come this code returns true?

string to match: ab

pattern: /^a|b$/

but when I put parentheses like this:

pattern: /^(a|b)$/

it will then return false.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Rei
  • 761
  • 1
  • 7
  • 17
  • PHP, C++, Python, which regex implementation are you asking about? (Looks like PHP PCRE to me.) – BoltClock Jun 08 '11 at 13:22
  • 1
    I believe they have the same implementation regarding the code I've provided so I've put them on the tags :/ – Rei Jun 08 '11 at 13:24

5 Answers5

19

The first pattern without the parenthesis is equivalent to /(^a)|(b$)/.
The reason is, that the pipe operator ("alternation operator") has the lowest precedence of all regex operators: http://www.regular-expressions.info/alternation.html (Third paragraph below the first heading)

Daniel Hilgarth
  • 171,043
  • 40
  • 335
  • 443
9

/^a|b$/ matches a string which begins with an a OR ends with a b. So it matches afoo, barb, a, b.

/^(a|b)$/ : Matches a string which begins and ends with an a or b. So it matches either an a or b and nothing else.

This happens because alteration | has very low precedence among regex operators.

Related discussion

Community
  • 1
  • 1
codaddict
  • 445,704
  • 82
  • 492
  • 529
4

The first means begin by an a or end with a b.

The second means 1 character, an a or a b.

AProgrammer
  • 51,233
  • 8
  • 91
  • 143
1

| has lower priority than the anchors, so you're saying either ^a or b$ (which is true) as opposed to the 2nd one which means "a single character string, either a or b" (which is false).

Blindy
  • 65,249
  • 10
  • 91
  • 131
  • it has a **lower** priority, not a higher one! – Daniel Hilgarth Jun 08 '11 at 13:32
  • Er, sure, I meant it's evaluated *before* :) I've always found this lower/higher thing arbitrary and counter-intuitive... – Blindy Jun 08 '11 at 13:34
  • 1
    But it's not evaluated _before_ -- it's evaluated _after_. Lowest priority is evaluated last, highest is first. That should be intuitive. I think it's the "before/after" terminology that's causing the confusion, since outermost/biggest actually suggests lower priority. – Wiseguy Jun 08 '11 at 14:32
1

In ^a|b$ you are matching for an a at the beginning or a b at the end.

In ^(a|b)$ you are matching for an a or a b being the only character (at beginning and end).

Cobra_Fast
  • 15,671
  • 8
  • 57
  • 102