I was parsing stackoverflow dump and came up on this seemingly innocent question with small, almost invisible detail that it has 22311 spaces at the end of text.
I'm using std::regex (somehow they work better for me than boost::regex) to replace all continuous whitespaces with single space like this:
std::regex space_regex("\\s+", std::regex::optimize);
...
std::regex_replace(out, in, in + strlen(in), space_regex, " ");
SIGSEGV shows up and I have begun to investigate.
Test code:
#include <regex>
...
std::regex r("\\s+", std::regex::optimize);
const char* bomb2 = "Small text\n\nwith several\n\nlines.";
std::string test(bomb2);
for (auto i = 0; i < N; ++i) test += " ";
std::string out = std::regex_replace(test.c_str(), r, " ");
std::cout << out << std::endl;
for (gcc 5.3.0)
$ g++ -O3 -std=c++14 regex-test.cpp -o regex-test.out
maximum N
before SIGSEGV shows up is 21818 (for this particular string), and for
$ g++ -O0 -std=c++14 regex-test.cpp -o regex-test.out
it's 12180.
'Ok, let's try clang, it's trending and aims to replace gcc' - never have I been so wrong. With -O0
clang (v. 3.7.1) crashes on 9696 spaces - less then gcc, but not much, yet with -O3
and even with -O2
it crashes on ZERO spaces.
Crash dump presents huge stacktraces (35k frames) of recursive calls of
std::__detail::_Executor<char*, std::allocator<std::__cxx11::sub_match<char*> >, std::__cxx11::regex_traits<char>, true>::_M_dfs
Question 1: Is this a bug? If so, should I report it?
Question 2: Is there smart way to overcome the problem (other than increasing system stack size, trying other regex libraries and writing own function to replace whitespaces)?
Amendment: bug report created for libstdc++