4

I'm using ECMA script syntax in c++ for input validation and have run into an issue when changing compilers. When using alternation, the first expression from the left to match should be used, unless disqualified by the rest of the regex. So, for the string "abcde" the expression "ab?|ab?(?:cd|dc)" should match "ab". I've found that different compilers have different opinions about that.

MCVE:

#include <regex>
#include <string>
#include <iostream>

int main()
{
    std::string line = "abcde";
    {
        const std::string RX_ION_TYPE("ab?|ab?(?:cd|dc)");
    
        const auto regexType = std::regex::ECMAScript;
    
        std::regex rx_ionType;
    
        rx_ionType.assign(
            "(" + RX_ION_TYPE + ")"
            , regexType);
    
        std::smatch match;
    
        if (std::regex_search(line, match, rx_ionType))
        {
            for (int i = 0; i < match.size(); i++)
            {
                std::cout << "|" << match.str(i) << "|\n";
            }
            
        }
        else
        {
            std::cout << "No match.\n";
        }
    }

    {
        const std::string RX_ION_TYPE("ab?(?:cd|dc)|ab?");
    
        const auto regexType = std::regex::ECMAScript;
    
        std::regex rx_ionType;
    
        rx_ionType.assign(
            "(" + RX_ION_TYPE + ")"
            , regexType);
    
        std::smatch match;
    
        if (std::regex_search(line, match, rx_ionType))
        {
            for (int i = 0; i < match.size(); i++)
            {
                std::cout << "|" << match.str(i) << "|\n";
            }
            
        }
        else
        {
            std::cout << "No match.\n";
        }
    }
    {
        const std::string RX_ION_TYPE("ab?(?:cd|dc)?");

        const auto regexType = std::regex::ECMAScript;

        std::regex rx_ionType;

        rx_ionType.assign(
            "(" + RX_ION_TYPE + ")"
            , regexType);

        std::smatch match;

        if (std::regex_search(line, match, rx_ionType))
        {
            for (int i = 0; i < match.size(); i++)
            {
                std::cout << "|" << match.str(i) << "|\n";
            }

        }
        else
        {
            std::cout << "No match.\n";
        }
    }

    return 0;
}

Online: ideone (gcc 5.1) cpp.sh (gcc 4.9.2) rextester

I would expect to get

|ab|
|ab|
|abcd|
|abcd|
|abcd|
|abcd|

which is indeed the case with Visual Studio 2013, gcc 5.1 (ideone), and clang (rextester) but not for gcc 4.9 (ubuntu locally and cpp.sh) where I get

|abcd|

for all three of them.

My question(s):

  1. Is my assumption that the alternation is read from left to right incorrect as far as the standard goes?
  2. gcc 4.9 seems to be broken and fixed in gcc 5. As I'm using CUDA in my actual project, I have to keep using gcc 4.9. Is there any way to make gcc 4.9 use the standard convention (besides rewriting the regexes)?
Community
  • 1
  • 1
Avi Ginsburg
  • 10,323
  • 3
  • 29
  • 56
  • 1
    C++11 was still work-in-progress in gcc 4.9 days. Without digging into gcc's revision history, it seems quite likely that, back in the 4.9 days, `std::regex` support was not yet complete. The current version of gcc is 6.1.1. gcc 4.9 is ancient history. – Sam Varshavchik Jul 21 '16 at 12:37
  • As you may notice, when trying to compile it with gcc4.9.2 you get the warning *library support for the ISO C++ 2011 standard. **This support is currently experimental**, and must be enabled with the -std=c++11 or -std=gnu++11 compiler options.* – Thomas Ayoub Jul 21 '16 at 12:38
  • @ThomasAyoub When compiling with 4.9.3 and `-Wall -std=c++11` I don't get such a warning, so I *assumed* that it is/was no longer too experimental. – Avi Ginsburg Jul 21 '16 at 12:41
  • @SamVarshavchik That ancient history is the latest compiler supported by CUDA 7.5 (current release version). – Avi Ginsburg Jul 21 '16 at 13:09
  • As much as I dislike suggesting boost, C++ 11's `std::regex` was inspired by `boost::regex`. You could probably get by using that for now. – md5i Jul 21 '16 at 13:13
  • @md5i We've been weening out most other unnecessary boost usages, so I'm not sure what to say to that. It may be the way to go. – Avi Ginsburg Jul 21 '16 at 13:16

0 Answers0