3

I am trying to use regexps in c++11, but my code always throws an std::regex_error of Invalid special open parenthesis.. A minimal example code which tries to find the first duplicate character in a string:

std::string regexp_string("(?P<a>[a-z])(?P=a)"); // Nothing to be escaped here, right?
std::regex  regexp_to_match(regexp_string);
std::string target("abbab");
std::smatch matched_regexp;
std::regex_match(target, matched_regexp, regexp_to_match);
for(const auto& m: matched_regexp)
{
    std::cout << m << std::endl;
}

Why do I get an error and how do I fix this example?

Adam Hunyadi
  • 1,890
  • 16
  • 32

3 Answers3

1

There are 2 issues here:

Use

std::string regexp_string(R"(([a-z])\1)");
std::regex regexp_to_match(regexp_string);
std::string target("abbab");
std::smatch matched_regexp;
if (std::regex_search(target, matched_regexp, regexp_to_match)) {
    std::cout << matched_regexp.str() << std::endl;
}
// => bb

See the C++ demo

The R"(([a-z])\1)" raw string literal defines the ([a-z])\1 regex that matches any lowercase ASCII letter and then matches the same letter again.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

http://en.cppreference.com/w/cpp/regex/ecmascript says that ECMAScript (the default type for std::regex) requires (?= for positive lookahead.

bipll
  • 11,747
  • 1
  • 18
  • 32
  • Can you tell me how I can correct the example then? My goal is to have labeled capture groups – Adam Hunyadi Nov 15 '17 at 21:03
  • Groups that capture what? Do you need `"(a[a-z])(a)"` or something? – bipll Nov 15 '17 at 21:06
  • I need to find reoccuring patterns, so I want to have groups that capture only the same characters for a given capture group label. – Adam Hunyadi Nov 15 '17 at 21:08
  • 1
    Do you mean `"(a[a-z])\1"` to capture "ab" in "abab"? You can look for identical matches by using back references: "\". – bipll Nov 15 '17 at 21:13
  • In this example, I wanted to capture "bb": https://regex101.com/r/DqgnAs/1 Can I do it with proper labels? – Adam Hunyadi Nov 15 '17 at 21:14
  • 1
    So probably simply `([a-z])\1`: a group that captures a single lowercase character and immediately a backreference to it. – bipll Nov 15 '17 at 21:21
0

The reason your regex crashes for you is because named groups not supported by std::regex. However you can still use what is available to find the first duplicate char in string:

#include <iostream>
#include <regex>

int main()
{
    std::string s = "abc def cde";
    std::smatch m;
    std::regex r("(\\w).*?(?=\\1)");

    if (std::regex_search(s, m, r))
        std::cout << m[1] << std::endl;

    return 0;
}

Prints

c
Killzone Kid
  • 6,171
  • 3
  • 17
  • 37