7
#include <boost/regex.hpp>
#include <string>
#include <iostream>

using namespace boost;
static const regex regexp(
        "std::vector<"
            "(std::map<"
                   "(std::pair<((\\w+)(::)?)+, (\\w+)>,?)+"
             ">,?)+"
        ">");

std::string errorMsg =
"std::vector<"
        "std::map<"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>"
        ">,"
        "std::map<"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>,"
                "std::pair<Test::Test, int>"
        ">"
">";
int main()
{
    smatch result;
    if(regex_match(errorMsg, result, regexp))
    {  
        for (unsigned i = 0; i < result.size(); ++i)
        {  
            std::cout << result[i] << std::endl;
        }
    }

//    std::cout << errorMsg << std::endl;

    return 0;
}

this produces:

terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error>
>'   what():  Ran out of stack space trying to match the regular expression.

compiled with

g++ regex.cc -lboost_regex

EDIT

my platform:

g++ (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5
libboost-regex1.42
Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz
So the latest Ubuntu 64 bit
Ross Rogers
  • 23,523
  • 27
  • 108
  • 164

2 Answers2

13

((\\w+)(::)?)+ is one of the so called "pathological" regular expressions -- it's going to take exponential time, because you have two expressions which are dependent upon each other one right after the other. That is, it fails due to catastrophic backtracking.

Consider if we follow the example of the link, and reduce "something more complicated" to "x". Let's do that with \\w:

  • ((x+)(::)?)+

Let's also assume that our input is never going to have ::. Having this actually makes the regex more complex, so if we throw out complexity then we really should be making things simpler if nothing else:

  • (x+)+

Now you've got a textbook nested quantifier problem like that detailed in the link above.

There are a few ways to fix this but the simplest way is probably to just disallow backtracking on the inner match using the atomic group modifier "(?>":

  • ((?>\\w+)(::)?)+
Billy ONeal
  • 104,103
  • 58
  • 317
  • 552
  • 2
    A pathological sub-regex could be the problem, but it's not the one you mentioned -- `\w` is just `[A-Za-z0-9_]` or thereabouts, so it already excludes `,` and `:` just like your proposed fix and so will take the same time. Also I think you need an opening paren on both regexes. – j_random_hacker Dec 21 '10 at 02:25
  • @j_random_hacker: Hmm.. good point. My proposed fix doesn't fix the problem, but catastrophic backtracking is the issue.. give me a sec to fix it. – Billy ONeal Dec 21 '10 at 02:27
  • Can you reproduce the problem? – Industrial-antidepressant Dec 21 '10 at 02:39
  • @nice blonde stupid girl: Sorry -- I had the wrong syntax before. It should be `(?> )` not `+?` in order to disallow backtracking. No, I can't reproduce the problem, because I don't have a working installation of `boost::regex` to play with at the moment. – Billy ONeal Dec 21 '10 at 02:40
  • @Billy: Yep, and I was in the process of writing more or less exactly that :) I would suggest `([^>]+)` as a simpler alternative though (for the whole `((\\w+)(::)?)+, (\\w+)>,?)+` mess). – j_random_hacker Dec 21 '10 at 02:42
  • @Billy: That version is not match – Industrial-antidepressant Dec 21 '10 at 02:43
  • @j_random_hacker: But I want to use the matched values from std::pair<> that is why I use this form. – Industrial-antidepressant Dec 21 '10 at 02:46
  • 1
    @nice: `std::vector<(std::map<(std::pair<((?>\w+)(::)?)+, (\w+)>,?)+>,?)+>` worked for me. (I used RegexBuddy though for testing rather than Boost) – Billy ONeal Dec 21 '10 at 02:49
  • @nice: If Boost's regex library works the way most do, then each `()` group only captures the last instance matched, not all instances. Is that really what you want? But if somehow it is, just use `([^,]+), (\\w+)` instead of `((\\w+)(::)?)+, (\\w+)`. – j_random_hacker Dec 21 '10 at 02:56
  • @j_random_hacker: but when I define BOOST_REGEX_MATCH_EXTRA, than I can every submatch through captures() – Industrial-antidepressant Dec 21 '10 at 03:00
  • @nice: Well that feature is news to me! And it's almost certainly the reason why you're blowing your stack while others can't reproduce the problem. Don't you think you should have mentioned the fact that you enabled this experimental feature? -1 to your question. – j_random_hacker Dec 21 '10 at 03:11
  • @j_random_hacker: I did not used that feature now. I am just planning to use it, because I used it somewhere else and I know it is working. So the error message is present without BOOST_REGEX_MATCH_EXTRA – Industrial-antidepressant Dec 21 '10 at 03:13
  • 1
    @nice: OK fair enough, but turning that feature on will almost certainly blow the stack if any backtracking is done, so I'd advise against doing that. – j_random_hacker Dec 21 '10 at 03:19
1

Tested it locally and it worked fine, my guess is your compiler is doing something weird.

What version of gcc? what platform? which version of boost?

 -> ./regex
std::vector<std::map<std::pair<Test::Test, int>,std::pair<Test::Test, int>,std::pair<Test::Test, int>>,std::map<std::pair<Test::Test, int>,std::pair<Test::Test, int>,std::pair<Test::Test, int>>>
std::map<std::pair<Test::Test, int>,std::pair<Test::Test, int>,std::pair<Test::Test, int>>
std::pair<Test::Test, int>
Test
Test
::
int
OneOfOne
  • 95,033
  • 20
  • 184
  • 185
  • My test was with `boost-1.42 / gcc version 4.5.1 (Gentoo 4.5.1-r1 p1.4, pie-0.4.5)`, is it possible to try a different version of gcc and/or boost? – OneOfOne Dec 21 '10 at 02:24
  • Strange. I can try it somewhere else, but later. I will see my aptitude for an other version – Industrial-antidepressant Dec 21 '10 at 02:34
  • FYI, I'm hitting this webpage because code that worked under an old version of boost (1.43.0) hit this stack space error on a new version of boost (1.53.0). Thus, the version of boost does seem pertinent to whether the issue is manifested. – Ross Rogers Apr 16 '13 at 20:45