3

I want to parse a token that looks like this:

1111111111111111:1384537090:Gl21j08WWBDUCmzq9JZoOXDzzP8=

I use a regular expression ([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\+/=]{28}), and it does the job when I try it.

Then I try it with C++:

std::regex regexp(R"(([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\+/=]{28}))", 
     std::regex_constants::basic);
std::smatch match;

if (std::regex_search(stringified, match, regexp)) {
    cout << match[0] << ',' << match[1] << ',' << match[2] << endl;
} else {
    cout << "No matches found" << endl;
}

I compile it on Ubuntu 13.10 x64 using GCC 4.8.1 with -std=c++11 flag. But I always get No matches found. What am I doing wrong?

Elikill58
  • 4,050
  • 24
  • 23
  • 45
Bogdan Kulynych
  • 683
  • 1
  • 6
  • 17

2 Answers2

2

You were specifying POSIX basic regex, in that format you must escape () and {}

I was able to get get matches with a few changes:

 int main(int argc, const char * argv[]){
    using std::cout;
    using std::endl;
    std::regex regexp(R"(\([0-9]\{16\}\):\([0-9]\{5,20\}\):\([a-zA-Z0-9\\+/=]\{28\}\))",std::regex_constants::basic);
    std::smatch match;
    std::string stringified = "1111111111111111:1384537090:Gl21j08WWBDUCmzq9JZoOXDzzP8=";
    if (std::regex_search(stringified, match, regexp)) {
        cout << match[1] << "," << match[2] << "," << match[3]<< endl;
    } else {
        cout << "No matches found" << endl;
    }
    return 0;
}

Or you could use:

std::regex_constants::extended

If you use std::regex_constants::extended you should not escape () and {}

If you don't want to use a raw string, you can do that as well:

std::regex regexp("([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\\\+/=]{28})",std::regex_constants::extended);

You'll just have to double up on the \\ to properly escape them. The above regex also works with the default regex grammar std::regex_constants::ECMAScript

std::regex regexp("([0-9]{16}):([0-9]{5,20}):([a-zA-Z0-9\\\\+/=]{28})");

It looks like GCC just added regex supported in their development branch of GCC 4.9.

benjamin
  • 362
  • 2
  • 8
  • 1
    R"( is the opening sequence for a raw quote. The first ( is not part of the regex. – jbruni Nov 15 '13 at 18:22
  • You are correct about the raw quote. I've edited my response to reflect that. However, the first match is indeed the full string. You can run the code yourself and see. – benjamin Nov 15 '13 at 18:42
  • I have just tried the code above (both with `basic` and `extended` constants). Still had `No matches found`. What compiler did you use to compile it? – Bogdan Kulynych Nov 15 '13 at 20:29
  • I used clang/llvm on OS X. If you use `std::regex_constants::extended`, you should not escape `()` and `{}` – benjamin Nov 15 '13 at 20:36
1

It appears that you need to use 'extended' syntax. Change regex_constants::basic to regex_constants::extended and it will match.

You need extended syntax in order to perform capturing.

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04

jbruni
  • 1,238
  • 8
  • 12