58

OK, this isn't the original program I had this problem in, but I duplicated it in a much smaller one. Very simple problem.

main.cpp:

#include <iostream>
#include <regex>
using namespace std;

int main()
{
    regex r1("S");
    printf("S works.\n");
    regex r2(".");
    printf(". works.\n");
    regex r3(".+");
    printf(".+ works.\n");
    regex r4("[0-9]");
    printf("[0-9] works.\n");
    return 0;
}

Compiled successfully with this command, no error messages:

$ g++ -std=c++0x main.cpp

The last line of g++ -v, by the way, is:

gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3)

And the result when I try to run it:

$ ./a.out 
S works.
. works.
.+ works.
terminate called after throwing an instance of 'std::regex_error'
  what():  regex_error
Aborted

It happens the same way if I change r4 to \\s, \\w, or [a-z]. Is this a problem with the compiler? I might be able to believe that C++11's regex engine has different ways of saying "whitespace" or "word character," but square brackets not working is a stretch. Is it something that's been fixed in 4.6.2?

EDIT:

Joachim Pileborg has supplied a partial solution, using an extra regex_constants parameter to enable a syntax that supports square brackets, but neither basic, extended, awk, nor ECMAScript seem to support backslash-escaped terms like \\s, \\w, or \\t.

EDIT 2:

Using raw strings (R"(\w)" instead of "\\w") doesn't seem to work either.

Shay Guy
  • 1,010
  • 1
  • 10
  • 21
  • I haven't used the regex classses yet, but are you sure you're using the correct one? I recall C++11 having several different ways to interpret regex. – Pubby Nov 09 '11 at 03:42
  • Is there any useful info in the `regex_error`? – jswolf19 Nov 09 '11 at 03:49
  • @jswolf19 I don't know how to determine that; exception handling isn't one of my stronger skills. – Shay Guy Nov 09 '11 at 03:55
  • @jswolf19 there certainly is. in OP's case, it contains `regex_constants::error_brack` ("mismatched brackets"), although it's not terribly helpful. – Cubbi Nov 09 '11 at 03:55
  • 3
    Do you know how to catch an exception? If you catch `regex_error` it will have a method called `code()` that will return a constant from `std::regex_constants::error_type`. See http://en.cppreference.com/w/cpp/regex/error_type for their meanings. – Some programmer dude Nov 09 '11 at 05:55
  • As of G++ 4.7.2, there's a problem with gcc. I have the same problem however VC++ 2012 works perfectly – Sambatyon Feb 22 '13 at 09:57
  • 1
    The problem is still there with g++-4.8.1. No \w, no [a-z]. – rtlgrmpf Jun 17 '13 at 10:43
  • related question ["Is gcc 4.8 or earlier buggy about regular expressions? "](https://stackoverflow.com/q/12530406/52074) – Trevor Boyd Smith Dec 05 '18 at 15:22
  • 2
    Possible duplicate of [Is gcc 4.8 or earlier buggy about regular expressions?](https://stackoverflow.com/questions/12530406/is-gcc-4-8-or-earlier-buggy-about-regular-expressions) – Trevor Boyd Smith Dec 05 '18 at 15:24

3 Answers3

34

Update: <regex> is now implemented and released in GCC 4.9.0


Old answer:

ECMAScript syntax accepts [0-9], \s, \w, etc, see ECMA-262 (15.10). Here's an example with boost::regex that also uses the ECMAScript syntax by default:

#include <boost/regex.hpp>

int main(int argc, char* argv[]) {
  using namespace boost;
  regex e("[0-9]");
  return argc > 1 ? !regex_match(argv[1], e) : 2;
}

It works:

$ g++ -std=c++0x *.cc -lboost_regex && ./a.out 1

According to the C++11 standard (28.8.2) basic_regex() uses regex_constants::ECMAScript flag by default so it must understand this syntax.

Is this C++11 regex error me or the compiler?

gcc-4.6.1 doesn't support c++11 regular expressions (28.13).

jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • Like I said in my edit, my current problem is backslashes, not square brackets. And I don't understand your last bit -- I've been using that compiler and some of these regexes have been working. Besides, I don't see any reference on that page to 4.6.1 in particular. – Shay Guy Nov 09 '11 at 17:27
  • @Shay Guy: 1. ECMAScript syntax also supports backslashes and much more (the syntax is similar to the perl5 regex syntax if it is more familiar to you). 2. If gcc's current trunk (future) doesn't support regular expressions then it follows that gcc-4.6.1 (past) doesn't support it. – jfs Nov 09 '11 at 22:12
  • I still don't follow. If gcc doesn't support regex, then why did _any_ of these work without throwing an error? Even square brackets worked with some `regex_constants` parameters, despite backslashes not working right. – Shay Guy Nov 09 '11 at 23:11
  • @Shay Guy: [the status page that I've linked above](http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2011) says that there is *partial* support for some things. Go to the page, look for `28` (regular expressions section). – jfs Nov 09 '11 at 23:50
  • u can use clang for std::regex afaik, building it is somewhat straightforward: http://solarianprogrammer.com/2011/10/16/llvm-clang-libc-linux/ – smerlin Nov 10 '11 at 02:09
  • @smerlin: I saw it, but the question is about gcc. It is even simpler to install `boost::regex`: `apt-get install libboost-regex-dev`. The interface is almost the same only the namespace is different. – jfs Nov 10 '11 at 02:29
  • 1
    @ShayGuy - This is clearly the correct answer. It is unfortunate that g++'s libstdc++ doesn't yet support full regex's. One solution that leaves your code free to switch away from boost is your own namespace and a bunch of using directives to import the stuff you need from the boost or std namespace. – Omnifarious Nov 10 '11 at 13:51
31

The error is because creating a regex by default uses ECMAScript syntax for the expression, which doesn't support brackets. You should declare the expression with the basic or extended flag:

std::regex r4("[0-9]", std::regex_constants::basic);

Edit Seems like libstdc++ (part of GCC, and the library that handles all C++ stuff) doesn't fully implement regular expressions yet. In their status document they say that Modified ECMAScript regular expression grammar is not implemented yet.

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • 4
    That's sort of disappointing. Why did they make the default be something so strange? How annoying. Is there good documentation to be found anywhere on what the various syntaxes are? – Omnifarious Nov 09 '11 at 04:01
  • 2
    @ShayGuy - This page contains a list of the possible syntaxes: http://en.cppreference.com/w/cpp/regex/syntax_option_type - I bet that `::std::regex_constants::extended` would work. – Omnifarious Nov 09 '11 at 04:03
  • @Omnifarious See http://en.cppreference.com/w/cpp/regex/syntax_option_type for the different expression types. – Some programmer dude Nov 09 '11 at 04:03
  • @JoachimPileborg - Found it. But it doesn't say what things are allowed or not allowed by the various syntaxes. – Omnifarious Nov 09 '11 at 04:05
  • @Omnifarious There are links to the `basic`, `extended` and `awk` syntaxes. For the rest you just have to google it yourself it seems. – Some programmer dude Nov 09 '11 at 04:07
  • @Omnifarious Same story. Works with `[0-9]`, but not `\\w`. (Would there be a way to set the syntax once for the entire program, or at least the entire sourcefile?) – Shay Guy Nov 09 '11 at 04:08
  • @ShayGuy - Just make your own function for creating them that supplies the extra argument. Make it `inline` in an anonymous namespace. – Omnifarious Nov 09 '11 at 04:14
  • @ShayGuy The error code in the exception for `\\w` is `error_escape` ("The expression contains an invalid escaped character or a trailing escape.") Seems to work for `\\\\w` though if it will be correct or not I don't now. – Some programmer dude Nov 09 '11 at 04:15
  • @JoachimPileborg `regex r5("\\\\w", regex_constants::awk)` doesn't throw an error, but `regex_match("foo",r5)` returns `false`. Curiously, the same happens with `\\foo`. – Shay Guy Nov 09 '11 at 04:34
  • @ShayGuy After reading a little, it seems that regular expressions of `ECMAScript` should support things like `\w`. Unfortunately I don't have a C++11 compiler at work and can't test it. – Some programmer dude Nov 09 '11 at 06:00
  • 7
    -1: [ECMAScript supports `[0-9]`](http://stackoverflow.com/questions/8060025/is-this-c11-regex-error-me-or-the-compiler/8061172#8061172). – jfs Nov 09 '11 at 06:58
7

Regex support improved between gcc 4.8.2 and 4.9.2. For example, the regex =[A-Z]{3} was failing for me with:

Regex error

After upgrading to gcc 4.9.2, it works as expected.

Drew Noakes
  • 300,895
  • 165
  • 679
  • 742
  • 5
    By "support improved between gcc 4.8.2 and 4.9.2" what you really mean to say is that "GCC 4.8 has no regex support; GCC 4.9 has regex support". [The availability of the experimental header in pre-4.9 versions is an unfortunate and misleading historical legacy](http://stackoverflow.com/a/12665408/560648). – Lightness Races in Orbit Dec 29 '14 at 17:30
  • 4
    Actually 4.8 has the std::regex class and friends, but the support for regex language was incomplete, hence your code would compile but not match as expected. – Drew Noakes Dec 29 '14 at 20:06
  • 1
    I am capable of reading a short comment to the end and even clicking on links. We seem to disagree on the definition of "support". – Drew Noakes Dec 29 '14 at 20:45
  • 1
    Not necessarily. GCC 4.8 simply did not support the regex implementation that was shipped, it being known not to be an actual C++11 regex implementation. It was not compliant and was never deemed as so. Such is the misfortune of it having been shipped in a header accessible as ``. This is my point. – Lightness Races in Orbit Dec 30 '14 at 05:23