I have some text with lowercase letters, dots, parentheses, and greater than and less than signs. Suppose that I want to match substrings on each line that (1) begin with a period, (2) contain any number of letters, and (3) have a non-negative number of either parentheses or <
/>
signs, but not both. Therefore, given this text,
foobar.hello(world)
foobar.hello<world>
foobar.hello>>>world<>(baz)
I want to match .hello(world)
on the first line, .hello<world>
on the second line, and .hello>>>world<>
on the third (since I can't mix parentheses and <
/>
signs).
I could use two regular expressions to match my desired strings, \.[a-z()]+
and \.[a-z<>]+
. However, because regexes are more efficient when similar patterns are combined, I tried to combine them into a single regex with a logical OR |
:
\.(?:[a-z()]+|[a-z<>]+)
After trying this online, while the regex matched my desired substring for the first line, for the second and third lines, it only matched .hello
. Yet when I switch the order of the elements, the opposite happens—the first line gets matched as .hello
, and the second and third lines are matched as desired. This comes to me as a surprise, since I wouldn't think order would matter with an OR operator. What's happening here?