1

For starters, I'm using primarily GCC 4.8 on Ubuntu 14.04. Everything standard.

OK, so I looked around SO and it seems that GCC's support for regexes is somewhat faulty. However, the problem I as having is so basic, that it makes me wonder very much wonder if I am the one doing something that is not permitted, or if the C++ library might be even more compromised than just the regex part.

I had two simple regexes that would be used in different parts of the program, so I defined them in the global namespace as follows.

regex DATAFILE_PATTERN("^(.+?)(|\\.(db|[ng]?dbm|dir|pag|newhash))$");
regex TEMP_FILES("^.*(tmp|ypxfr_map).*$");

The code compiled nicely, without a single warning. However, when I ran it, it failed with std::regex_error (malformed regex) before even reaching main().

That exception was outrageous, since that was nothing wrong with the regexes, as you can see. Even then, I simplified them a little to try to find if how it could be malformed. The result was that, depending on the expression -- all for the valid! --, I would sometimes get the std::regex_error, and sometimes just a segmentation fault, without any language-level exception.

Then I moved the regexes from the namespace scope to inside the functions where they were used, and now they worked nicely.

Is there any explanation for that better just "it's a GCC bug"? I mean something like a stackframe limit during initialization of globals and/or static data that might have been reached due to regex's complicated internals. If so, could it be overcome with some tunable?

I tried the same code with clang++ 3.4 with the same libstdc++, and the results were similar, so it makes me think that the compiler might not be the one to blame (unless clang uses tunables similar to the ones in GCC). I tried to install LLVM's libc++, but so far I could get it to work (lots of undefined symbols at link time, that I could not yet figure why -- any clues would also be appreciated).

I wonder if other compilers, such as MSVS or a properly configured Clang, would have similar issues.

Paulo1205
  • 918
  • 4
  • 9
  • 3
    All of this text and you still haven't told us what version of GCC you're using, and what your code looks like. – Lightness Races in Orbit Dec 07 '15 at 18:57
  • I don't know if it's a bug or not but your two patterns are badly written and may cause a catastrophic backtracking. – Casimir et Hippolyte Dec 07 '15 at 18:57
  • [Is gcc 4.8 or earlier buggy about regular expressions?](http://stackoverflow.com/a/12665408/3832970) – Wiktor Stribiżew Dec 07 '15 at 18:58
  • @stribizhev: note that he obtains a problem too with clang++. – Casimir et Hippolyte Dec 07 '15 at 19:00
  • 1
    @CasimiretHippolyte Doesn't matter. He's still using the same standard library implementation. – T.C. Dec 07 '15 at 19:02
  • 2
    "That exception was outrageous" - I don't know. Doesn't sound any more outrageous than [using something without looking at its documentation](https://gcc.gnu.org/onlinedocs/gcc-4.8.5/libstdc++/manual/manual/status.html#status.iso.2011) or asking a question on SO without performing a basic search. – T.C. Dec 07 '15 at 19:11
  • @CasimiretHippolyte, I would avoid backtracking if I could, but the parts that I am trying to insulate happens on some systems (BSDs, AIX, Solaris), and not on others (Linux). Do you have any suggestion? Besides, why would it not work only when declared at namespace level, and work finely when declared within a block? If regexes were outright broken -- OK, they are in this version --, I'd expect them to fail similarly in both contexts. – Paulo1205 Dec 07 '15 at 19:15
  • 1
    @Paulo1205 `` is broken prior to gcc 4.9. It may appear to work in some use cases, but it's not something to be relied upon. If you can't upgrade gcc, look into using Boost.Regex. – Praetorian Dec 07 '15 at 19:27

0 Answers0