5

Possible Duplicate:
No matches with c++11 regex

I was using boost::regex for some stuff before and for some new stuff I wanted to use std::regex until I noticed the following inconsistency - so question is which one is correct?

#include <iostream>
#include <regex>
#include <string>

#include <boost/regex.hpp>

void test(std::string prefix, std::string str)
{
  std::string pat = prefix + "\\.\\*.*?";

  std::cout << "Input   : [" << str << "]" << std::endl;
  std::cout << "Pattern : [" << pat << "]" << std::endl;

  {
    std::regex r(pat);
    if (std::regex_match(str, r))
      std::cout << "std::regex_match: true" << std::endl;
    else
      std::cout << "std::regex_match: false" << std::endl;

    if (std::regex_search(str, r))
      std::cout << "std::regex_search: true" << std::endl;
    else
      std::cout << "std::regex_search: false" << std::endl;
  }

  {
    boost::regex r(pat);
    if (boost::regex_match(str, r))
      std::cout << "boost::regex_match: true" << std::endl;
    else
      std::cout << "boost::regex_match: false" << std::endl;

    if (boost::regex_search(str, r))
      std::cout << "boost::regex_search: true" << std::endl;
    else
      std::cout << "boost::regex_search: false" << std::endl;
  }
}

int main(void)
{
  test("FOO", "FOO.*");
  test("FOO", "FOO.*.*.*.*");
}

For me (gcc 4.7.2, -std=c++11, boost: 1.51), I see the following:

Input   : [FOO.*]
Pattern : [FOO\.\*.*?]
std::regex_match: false
std::regex_search: false
boost::regex_match: true
boost::regex_search: true
Input   : [FOO.*.*.*.*]
Pattern : [FOO\.\*.*?]
std::regex_match: false
std::regex_search: false
boost::regex_match: true
boost::regex_search: true

If I change the pattern to a greedy pattern (.*), then I see:

Input   : [FOO.*]
Pattern : [FOO\.\*.*]
std::regex_match: true
std::regex_search: false
boost::regex_match: true
boost::regex_search: true
Input   : [FOO.*.*.*.*]
Pattern : [FOO\.\*.*]
std::regex_match: true
std::regex_search: false
boost::regex_match: true
boost::regex_search: true

Which one to believe? I would guess that boost is correct here?

Community
  • 1
  • 1
Nim
  • 33,299
  • 2
  • 62
  • 101
  • 2
    Boost is most likely correct, as not all standard libraries fully implement C++11 yet. The regular expression library seems to be most overlooked so far, at least in GCC, while the support in Visual C++ seems to be better. – Some programmer dude Nov 23 '12 at 10:15
  • The output you give can't be from the program you give. You have `std::string pat = prefix + "\\.\\*.*?";`, so if the `prefix` was `FOO.*` then `pat` must wind up being `FOO.*\.\*.*?`, not `FOO.*?`. – j_random_hacker Nov 23 '12 at 12:08
  • @j_random_hacker, yeah - sorry I just changed the code in the snippet - if you run it you'll get the same result.. – Nim Nov 23 '12 at 12:16
  • 3
    gcc's regex library is unusable. Don't draw any conclusions from what it does or does not do. – Pete Becker Nov 23 '12 at 13:16
  • @PeteBecker, thanks for that - I think I will continue to use boost::regex for the moment... – Nim Nov 23 '12 at 13:22
  • As others have noted, Boost is correct, GCC is wildly wrong. You should file a bug at: http://gcc.gnu.org/bugzilla/. – Eric Niebler Nov 24 '12 at 07:15

1 Answers1

8

gcc of course doesn't support the tr1/c++11 regex, but to give a more general answer, boost.regex's default is perl 5, according to its documentation, while C++ default is ECMAScript, extended by several locale-dependent elements of POSIX BRE.

Specifically, boost.regex supports the perl extensions listed here., but you're not using any of those.

Now, I got curious and ran your test through two more compilers:

Output from clang:

~ $ clang++ -o test test.cc -std=c++11 -I/usr/include/c++/v1 -lc++ -lboost_regex
~ $ ./test
Input   : [FOO.*]
Pattern : [FOO\.\*.*?]
std::regex_match: true
std::regex_search: true
boost::regex_match: true
boost::regex_search: true
Input   : [FOO.*.*.*.*]
Pattern : [FOO\.\*.*?]
std::regex_match: false
std::regex_search: true
boost::regex_match: true
boost::regex_search: true

Output from Visual Studio 2012 (sans boost)

Input   : [FOO.*]
Pattern : [FOO\.\*.*?]
std::regex_match: true
std::regex_search: true
Input   : [FOO.*.*.*.*]
Pattern : [FOO\.\*.*?]
std::regex_match: true
std::regex_search: true

Looking closer at clang's discrepancy, in the second test it matched the pattern [FOO\.\*.*?] to [FOO.*] and left [.*.*.*] unmatched, which quickly boils down to matching [S*?] differently from boost/visual studio.. which, I think, is a bug too.

Cubbi
  • 46,567
  • 13
  • 103
  • 169
  • I believe it's definitely a bug. Since it's classic extended regexp you can easily check it with `grep -E` and even grep agrees with boost (grep is arguably one of the oldest regexp engines around and have been thoroughly abused/tested by users). Funny that Microsoft got it right. – slebetman Nov 26 '12 at 02:19