2

I'm trying to migrate some regex tools from Qt to the std. In Qt, I can test if a regex is valid before using it with isValid()

In the std's <regex>, I don't see a way to do this. So for now, I have try/catch blocks that makes the regex with a user-provided regex and then tries to match it against a 1-char string to quickly trigger the std::regex_error exception without loading the actual search string(s) so I can quit out early. This is a dirty hack IMO, but I'm not sure if there's a better way to test them efficiently with std::regex. I'm basically trying to avoid performance hitches from catching and handling exceptions while using automated input with the tool.

try
{
    const std::regex regex_exception_trigger(regex_string);
    std::smatch stability_match;
    const std::string test_string = "0";
    if (std::regex_search(test_string.begin(), test_string.end(), stability_match, regex_exception_trigger)) {}
}
catch (std::regex_error &re) { std::cerr << re.what() << std::endl; print_help(); return exit_enum::BAD_REGEX;  }
kayleeFrye_onDeck
  • 6,648
  • 5
  • 69
  • 80
  • 2
    In my tests it doesn't get as far as testing the regex. If it is ill-formed it throws an exception in the constructor. – Galik Jun 26 '17 at 22:36
  • 1
    If you have user defined expressions, you should separate the construction from the usage functions. This gives some leeway knowing the construction failed vs the usage function throws. I just make a global then assign the regex upon user entry. This makes it easier. –  Jun 26 '17 at 23:20

2 Answers2

1

C++ libraries (especially the standard library) generally subscribes to the philosophy that if you have an instance of an class, then that instance is valid. So if you try to construct a class with a bad input, it will throw an exception when you construct it, not when you try to use it.

This is generally good, because it makes it clear where you went wrong: if I try to parse a string with a mis-constructed regex and I get an exception, the natural thought is that there is something wrong with the string, not with the regex.

Your use case doesn't fit well with this mold, since the C++ standard library assumes a poorly constructed regex to be exceptional (hence the exception).

Exceptions are cheap when they aren't thrown (i.e. there is little overhead to a try-catch block if you don't need to catch anything), but actually catching an exception can be expensive. If you expect to receive a lot of incorrectly constructed regular expressions and you think catching the exception will noticeable affect performance (despite the cost of catching an exception, you still might be OK, so you should do some tests), you will need to consider a different tool to validate the regular expressions before constructing them.

Boost also provides a regular expression library that the standard library's version is based on. The syntax will be very similar. Boost's version has a no_except flag that can be passed to the regex constructor and will suppress any exceptions from an invalid string. The reason I gave above is presumably why this flag was not included in the standard library version. If you need this behavior, you could consider using the Boost version instead.

SJL
  • 403
  • 2
  • 10
  • I've only seen C++ try/catch blocks catching something, like `catch(std::exception_name &e)`, or catching everything possible to catch `catch(...)` -- would I just do, `try{}catch{}` ? Or did you mean catch(...) would bypass the performance hit for not specifying the exception to catch? – kayleeFrye_onDeck Jun 26 '17 at 23:08
1

If you want to catch all errors, do it all in a try{}catch{}catch{}catch{} sequence.

I'd split up the construction from the usage.

pseudo code

std::regex Rx;
bool bIsConstructError = false;

////////////////////////////////////////
bool SetRx( std::string& strRx )
{
   bIsConstructError = false;
   try 
   {
      Rx.assign( "", 0);
      Rx.assign( strRx, 0 );
   }
   catch ( std::regex_error & e )
   {
      bIsConstructError = true;
      return false;
   }
   catch ( std::out_of_range & e )
   {
      bIsConstructError = true;
      return false;
   }
   catch ( std::runtime_error & e )
   {
      bIsConstructError = true;
      return false;
   }
   return true;
}

////////////////////////////////////////
bool  findText( std::string& strTarget )
{
   if ( bIsConstructError )
      return false;

   bool bRet = true;

   std::smatch _M;
   std::string::const_iterator start = strTarget.begin();
   std::string::const_iterator end = strTarget.end();

   try
   {
      if ( regex_search( start, end, _M, Rx, 0 ) )
      {
          // do something
      }
   }
   catch ( std::out_of_range & e )
   {
      bRet = false;
   }
   catch ( std::runtime_error & e )
   {
      bRet = false;
   }
   return bRet;
}
  • I'm trying to avoid the performance hit of handling them in a catch. I just want my app to gracefully close immediately on a bad regex, not do anything with the exceptions. – kayleeFrye_onDeck Jun 27 '17 at 02:35
  • I don't experience a performance hit with my apps, what do you mean. I don't see how you can protect against exceptions any other way. Further, you become blind when your app just locks up. Btw, the only catch statements needed are the one's I listed here. –  Jun 27 '17 at 16:36
  • 1
    Also, these regex functions all unwind the stack before throwing, and don't degrade memory. –  Jun 27 '17 at 16:42
  • Quite possibly I was confused when reading up about exception-handling for C++. It seems the performance hits are only incurred when _thrown_, if this answer is correct: https://stackoverflow.com/a/1897979/3543437 | If that's the case, this answer does a good job breaking down all the possible exceptions to handle. – kayleeFrye_onDeck Jun 27 '17 at 17:33
  • 1
    You use the regex library, and you're allowing user input regex. For only this reason should you use a try/catch. Not for any internal usage. On construction and usage, if an exception is thrown by this, the application will probably halt and not give you a chance to both see what it was, nor give the user another chance. And it doesn't look good to have your app just quit for no apparent reason. Most likely, there is a syntax error, but other times there could be a too complex error, or a stack error. –  Jun 27 '17 at 18:21