I believe the test is correct. It is instructive to search for "tournament" in all of the libc++ tests under re.alg, and compare how the different engines treat the regex("tour|to|tournament")
, and how regex_search
differs from regex_match
.
Let's start with regex_search
:
awk, egrep, extended:
regex_search("tournament", m, regex("tour|to|tournament"))
matches the entire input string: "tournament".
ECMAScript:
regex_search("tournament", m, regex("tour|to|tournament"))
matches only part of the input string: "tour".
grep, basic:
regex_search("tournament", m, regex("tour|to|tournament"))
Doesn't match at all. The '|' character is not special.
awk, egrep and extended will match as much as they can with alternation. However the ECMAScript alternation is "ordered". This is specified in ECMA-262. Once ECMAScript matches a branch in the alternation, it quits searching. The standard includes this example:
/a|ab/.exec("abc")
returns the result "a" and not "ab".
<plug>
This is also discussed in depth in Mastering Regular Expressions by Jeffrey E.F. Friedl. I couldn't have implemented <regex>
without this book. And I will freely admit that there is still much more that I don't know about regular expressions, than what I know.
At the end of the chapter on alternation the author states:
If you understood everything in this chapter the first time you read
it, you probably didn't read it in the first place.
Believe it!
</plug>
Anyway, ECMAScript matches only "tour". The regex_match
algorithm returns success only if the entire input string is matched. Since only the first 4 characters of the input string are matched, then unlike awk, egrep and extended, ECMAScript returns false with a zero-sized cmatch
.