I'm using ECMA script syntax in c++ for input validation and have run into an issue when changing compilers. When using alternation, the first expression from the left to match should be used, unless disqualified by the rest of the regex. So, for the string "abcde"
the expression "ab?|ab?(?:cd|dc)"
should match "ab"
. I've found that different compilers have different opinions about that.
MCVE:
#include <regex>
#include <string>
#include <iostream>
int main()
{
std::string line = "abcde";
{
const std::string RX_ION_TYPE("ab?|ab?(?:cd|dc)");
const auto regexType = std::regex::ECMAScript;
std::regex rx_ionType;
rx_ionType.assign(
"(" + RX_ION_TYPE + ")"
, regexType);
std::smatch match;
if (std::regex_search(line, match, rx_ionType))
{
for (int i = 0; i < match.size(); i++)
{
std::cout << "|" << match.str(i) << "|\n";
}
}
else
{
std::cout << "No match.\n";
}
}
{
const std::string RX_ION_TYPE("ab?(?:cd|dc)|ab?");
const auto regexType = std::regex::ECMAScript;
std::regex rx_ionType;
rx_ionType.assign(
"(" + RX_ION_TYPE + ")"
, regexType);
std::smatch match;
if (std::regex_search(line, match, rx_ionType))
{
for (int i = 0; i < match.size(); i++)
{
std::cout << "|" << match.str(i) << "|\n";
}
}
else
{
std::cout << "No match.\n";
}
}
{
const std::string RX_ION_TYPE("ab?(?:cd|dc)?");
const auto regexType = std::regex::ECMAScript;
std::regex rx_ionType;
rx_ionType.assign(
"(" + RX_ION_TYPE + ")"
, regexType);
std::smatch match;
if (std::regex_search(line, match, rx_ionType))
{
for (int i = 0; i < match.size(); i++)
{
std::cout << "|" << match.str(i) << "|\n";
}
}
else
{
std::cout << "No match.\n";
}
}
return 0;
}
Online: ideone (gcc 5.1) cpp.sh (gcc 4.9.2) rextester
I would expect to get
|ab|
|ab|
|abcd|
|abcd|
|abcd|
|abcd|
which is indeed the case with Visual Studio 2013, gcc 5.1 (ideone), and clang (rextester) but not for gcc 4.9 (ubuntu locally and cpp.sh) where I get
|abcd|
for all three of them.
My question(s):
- Is my assumption that the alternation is read from left to right incorrect as far as the standard goes?
- gcc 4.9 seems to be broken and fixed in gcc 5. As I'm using CUDA in my actual project, I have to keep using gcc 4.9. Is there any way to make gcc 4.9 use the standard convention (besides rewriting the regexes)?