2

According to regex101.com and an application I have called "RegExRx", this is a valid regular expression

(?<=\().*

Namely this should match everything that follows an open-parenthesis character. Here's how regex101.com analyzes this

/(?<=()./ (?<=() Positive Lookbehind - Assert that the regex below can be matched ( matches the character ( literally . matches any character (except newline) Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]

However, this C++11 program throws

libc++abi.dylib: terminating with uncaught exception of type std::__1::regex_error: The expression contained mismatched ( and ).

This is Clang as shipped with Xcode 5.1.1.

Question: Should Clang accept this regex? How can I get a std::regex that is semantically equivalent to this one?

#include <iostream>
#include <regex>

int main(int argc, const char * argv[])
{
    std::string x{ "(?<=\\().*" };
    std::cout << "Here is my regex string " << x << std::endl;
    std::regex{ x }; // throws
    return 0;
}

Edit: My question is different from the proposed duplicate because I asked "How can I get a std::regex that is semantically equivalent to this one?" The semantically equivalent workaround was very helpfully provided by user hwnd below.

Matthew James Briggs
  • 2,085
  • 2
  • 28
  • 58

2 Answers2

4

C++11 uses ECMAScript's regular expression syntax, lookbehind is not supported.

An equivalent of the above regular expression would be the following —

\\((.*)

Note: The capturing group ( ... ) retains everything that follows an open parenthesis.

Working Demo

hwnd
  • 69,796
  • 4
  • 95
  • 132
1

The constructor you're using is:

explicit basic_regex( const CharT* s,
                      flag_type f = std::regex_constants::ECMAScript );

which indicates that the default regex format is ECMAScript (or javascript, as most of us know it.)

If you set the regex flavour in regex101.com to javascript instead of pcre, you'll see the same error: (? is not recognized, so the ) doesn't have anything to match.

Note that none of the regex syntax types allow lookaheads or lookbehinds.

rici
  • 234,347
  • 28
  • 237
  • 341