4

I am using std::regex r("-?[0-9]*(.[0-9]+)?(e-?[0-9]+)?") to validate numbers (ints/fixed point/floating point). The MWE is below:

#include <iostream>
#include <string>
#include <vector>
#include <regex>
#include <algorithm>

using namespace std;

bool isNumber(string s) {
  // can ignore all whitespace
  s.erase(remove(s.begin(), s.end(), ' '), s.end());
  std::regex r("-?[0-9]*(.[0-9]+)?(e-?[0-9]+)?");
  return regex_match(s, r);
}

int main() {
  std::vector<string> test{
    "3", "-3", // ints
      "1 3", " 13", " 1 3 ", // weird spacing
      "0.1", "-100.123123", "1000.12312", // fixed point
      "1e5", "1e-5", "1.5e-10", "1a.512e4", // floating point
      "a", "1a", "baaa", "1a", // invalid
      "1e--5", "1e-", //invalid
      };

  for (auto t : test) {
    cout.width(20); cout << t << " " << isNumber(t) << endl;
  }

  return 0;
}

I notice the compile time is quite large compared to what I was expecting:

  • gcc 5.4 -O0 -std=c++11, 2.3 seconds
  • gcc 5.4 -O2 -std=c++11, 3.4 seconds
  • clang++ 3.8 -O0 -std=c++11, 1.8 seconds
  • clang++ 3.8 -O2 -std=c++11, 3.7 seconds

I use this for an online judge submission, which has a time limit on the compilation stage.

So, the obivous questions:

  • why is the compilation time so large? I'm under the impression that when I use regexes in vim/emacs/grep/ack/ag etc. (on the same machine) the compilation really takes a lot less than this.
  • is there any way to reduce the compilation time of a regular expression in C++?
paul-g
  • 3,797
  • 2
  • 21
  • 36
  • It's not so much the compilation time of the regex, but of all of the code required to implement all of the regex functionality. Compare the time to using 10 regexes. – chris Jan 25 '17 at 20:40
  • It looks like @chris is right. Try compiling with -save-temps, and look at the size of the resulting intermediate files. – G. Sliepen Jan 25 '17 at 20:48
  • The regex is probably going to build a finite state machine on the back end. That takes a lot of time. Most likely that is what you are seeing. – NathanOliver Jan 25 '17 at 20:48
  • 1
    As for the pattern itself, the dot must be escaped. – Wiktor Stribiżew Jan 25 '17 at 20:51

1 Answers1

1

You can certainly lighten the computational load of your regex by appropriately escaping the decimal point: -?[0-9]*(\.[0-9]+)?(e-?[0-9]+)? This will of course prevent you from returning a false positive on numbers like: "1 3" (don't worry that's a good thing those were 2 numbers.) But in this case and many others when you stoop to using regexes "Now you have 2 problems".

Using a istringstream would provide a more specialized, robust solution to this problem:

bool isNumber(const string& s) {
    long double t;
    istringstream r(s);

    return r >> t >> ws && r.eof();
}

Live Example

Community
  • 1
  • 1
Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288