12

Starting from C++11 the <regex> headers defines the functions std::regex_match, std::regex_search and std::regex_replace in §28.11. I guess there is a valid reason for these functions not to be noexcept, but I couldn't find any reference about what these might throw or why.

  1. What types of exceptions may these functions throw?
  2. What runtime conditions cause these exceptions to be thrown?
    • Does the standard ensure that for some sets of arguments these functions never throw, e.g. does it ensure that regex_match(anyString, regex(".")) never throws?

PS: Since some of these exceptions probably inherit from std::runtime_error, they might throw std::bad_alloc during their construction.

jotik
  • 17,044
  • 13
  • 58
  • 123
  • 1
    There's the obvious [`regex_error`](http://en.cppreference.com/w/cpp/regex/regex_error) – MSalters Mar 22 '16 at 11:22
  • Hey buddy, I don't think you will be able to get around not handling exceptions for the find/match/replace functions. For the compile you probably can. I'm using and assuming the Boost::Regex handling of exceptions as guide because, let's face it, C++11 regex originated from boost::regex engine. Viewing the boost code is easier to descipher. If you don't handle exceptions that are thrown the OS will handle it. If you think defining _noexcept_ (aka: BOOST_NO_EXCEPTIONS) will stop runtime throws during the execution of their implementation, you'd be wrong. –  Mar 26 '16 at 20:44
  • 1
    (con't) The primary exception of these functions are _ran_out_of_stack_ during internal recursion. Even setting the equivalent BOOST_REGEX_NON_RECURSIVE (the safer, but slower way) could still generate a throw. It's better to actually catch these than to get a system obscure message. Deep in the bowels of find/match/replace is the _imp_ of these functions that can _catch_ and rethrow everything. The exception can be from the OS or anywhere. Even catching won't guarantee your thread not locking up due to infinite recursion. –  Mar 26 '16 at 20:48
  • For safety, use my catching layout below. It works! –  Mar 26 '16 at 20:53

5 Answers5

6

regex_error is the only exception mentioned as being thrown from any of the classes or algorithms in <regex>. There are two basic categories of errors: malformed regular expressions and failure to process the match.

The constructors for basic_regex can throw a regex_error (as per [re.regex.construct]\3, \7, \14, and \17) if the argument (or sequence) passed in is "not a valid regular expression." The same is true if you try to assign a basic_regex to an invalid regular expression ([re.regex.assign]/15).

Separately from that, the algorithms can also throw regex_error([re.except]/1):

The functions described in this Clause report errors by throwing exceptions of type regex_error. If such an exception e is thrown, e.code() shall return either regex_constants::error_complexity or regex_constants::error_stack.

where those two error codes mean ([re.err]):

error_complexity: The complexity of an attempted match against a regular expression exceeded a pre-set level.
error_stack: There was insufficient memory to determine whether the regular expression could match the specified character sequence.

Barry
  • 286,269
  • 29
  • 621
  • 977
  • I think [§28.11.1](http://eel.is/c++draft/re.except#1) best answers my question. English is not my native language, but the wording of §28.11.1 seems to leave open the possibility of other types of exceptions being thrown. Do you have a comment on that? I don't care about the constructors, only functions in §28.11, so kindly ask you to base your answer around §28.11. – jotik Mar 23 '16 at 09:22
  • 1
    @jotik Saying that this type of exception may be thrown in no way implies that a different type of exception may also be thrown. – Barry Mar 23 '16 at 10:14
  • Ok. But is it possible to ensure these never throw? I.e. according to the standard (not implementations), is it possible to construct regular expressions using which the functions in §28.11 never throw? Second, am I correct, if I say that these functions only throw if some resource limit is exceeded? – jotik Mar 23 '16 at 17:16
  • @jotik I don't understand what about this you find confusing. They throw if the regex you provide is an invalid regex. That has nothing to do with resource limits. – Barry Mar 23 '16 at 17:42
  • I thought only the constructors (e.g. [std::basic_regex::basic_regex()](http://eel.is/c++draft/re.regex#re.regex.construct-3)) throw if the regex is invalid, not the functions in [§28.11](http://eel.is/c++draft/re.alg#re.except). – jotik Mar 23 '16 at 21:25
  • @jotik How are you possibly interpreting "The algorithms described in this subclause may throw an exception of type regex_error." to mean "the algorithms don't throw" ?!? Clearly, they can throw! – Barry Mar 23 '16 at 21:33
  • I want to know on what runtime condition they do throw and whether it is possible to evade this. I know I can't make them `noexcept` by type, I just want to know what are the runtime conditions are required for them to not throw. E.g. `int first(vector const & v) { return v.at(0u); }` CAN throw, but it only throws IF `v` is empty. To evade from exceptions when calling `first` I can code to ensure never to pass it an empty vector. I want to know about similar runtime conditions for those 3 regex functions. – jotik Mar 23 '16 at 22:37
  • I want to modify my code and filter my function arguments to ensure these wouldn't throw, or would only throw certain (`regex_error`) exceptions. Reflecting on this discussion, I made an attempt to improve my original questions a bit. – jotik Mar 23 '16 at 22:53
  • @jotik Seems a pretty safe bet that the example in your question will not throw. It'll only throw if the match is too complex. – Barry Mar 24 '16 at 01:04
  • Yes, but the standard doesn't provide a definition for safe? – jotik Mar 24 '16 at 17:39
3

C++11 §28.6 states

The class regex_error defines the type of objects thrown as exceptions to report errors from the regular expression library.

Which means that the <regex> library should not throw anything else by itself. You are correct that constructing a regex_error which inherits from runtime_error may throw bad_alloc during construction due to out-of-memory conditions, therefore you must also check for this in your error handling code. Unfortunately this makes it impossible to determine which regex_error construction actually throws bad_alloc.

For regular expressions algorithms in §28.11 it is stated in §28.11.1 that

The algorithms described in this subclause may throw an exception of type regex_error. If such an exception e is thrown, e.code() shall return either regex_constants::error_complexity or regex_-constants::error_stack.

This means that if the functions in §28.11 ever throw a regex_error, it shall hold one of these codes and nothing else. However, note also that things you pass to the <regex> library, such as allocators etc might also throw, e.g. the allocator of match_results which may trigger if results are added to the given match_results container. Also note that §28.11 has shorthand functions which "as if" construct match_results, such as

template <class BidirectionalIterator, class charT, class traits>
bool regex_match(BidirectionalIterator first, BidirectionalIterator last,
                 const basic_regex<charT, traits> & e,
                 regex_constants::match_flag_type flags =
                 regex_constants::match_default);

template <class BidirectionalIterator, class charT, class traits>
bool regex_search(BidirectionalIterator first, BidirectionalIterator last,
                  const basic_regex<charT, traits> & e,
                  regex_constants::match_flag_type flags =
                  regex_constants::match_default); 

and possibly others. Since such might construct and use match_results with the standard allocator internally, they might throw anything std::allocator throws. Therefore your simple example of regex_match(anyString, regex(".")) might also throw due to construction and usage of the default allocator.

Another caveat to note that for some <regex> functions and classes it is currently impossible to determine whether a bad_alloc was thrown by some allocator or during construction of a regex_error exception.

In general, if you need something with a better exception specifications avoid using <regex>. If you require simple pattern matching you're better off rolling your own safe match/search/replace functions, because it is impossible to constrain your regular expressions to avoid these exceptions in a portable nor forwards-compatible manner, even using an empty regular expression "" might give you an exception.

PS: Note that the C++11 standard is rather poorly written in some aspects, lacking complete cross referencing. E.g. there's no explicit notice under the clauses for the methods of match_results to throw anything, whereas §28.10.1.1 states (emphasis mine):

In all match_results constructors, a copy of the Allocator argument shall be used for any memory allocation performed by the constructor or member functions during the lifetime of the object.

So take care when browsing the standards like a lawyer! ;-)

mceo
  • 1,221
  • 11
  • 18
2

I believe this is what exceptions you should be handling.
For compiling there is 3 exceptions.
For search/match/replace you probably only need to handle 2.

Btw, if you don't handle exceptions the way described below, then your
code will be flying blind, and not meant for human consumption.

std::regex Regex;

bool CompileRegex( std::string& strRx, unsigned int rxFlags )
{
    try 
    {
        Regex.assign( strRx, rxFlags );
    }
    catch ( std::regex_error & e )
    {
            // handle e
        return false;
    }
    catch ( std::out_of_range & e )
    {
            // handle e
        return false;
    }
    catch ( std::runtime_error & e )
    {
            // handle e
        return false;
    }
    return true;
}

bool  UseRegex( std::string& strSource, std::string& strOut, std::string strReplace )
{
    try
    {
    if ( std::regex::regex_search( strSource, _match, Regex )
    {}
    // or
    if ( strOut = std::regex::regex_replace( strSource, Regex, strReplace ) )
    {}
    }
    catch ( std::out_of_range & e )
    {
            // handle e
        return false;
    }
    catch ( std::runtime_error & e )
    {
            // handle e
        return false;
    }
    return true;    
}
0

This link here might help. As you can see most of these are about invalid regular expression, more so than invalid inputs (which should and don't throw any errors, they just don't match.

Going through the here, I can see that regex_replace and regex constructor may throw one of the regex_error types of exception. I also seen some memory related exceptions, but as said these are runtime and could be thrown from any piece of code. Since documentation does not provide anything else, the only way to find out this would be from the code itself.

Athanasios Kataras
  • 25,191
  • 4
  • 32
  • 61
  • It might help eventually, but a full good answer on StackOverflow would help me (and others) much more. – jotik Mar 19 '16 at 19:39
  • May these functions also throw something else than `regex_error`? – jotik Mar 19 '16 at 19:48
  • As all functions can, other exceptions may be thrown, but not regex specific. i.e. something can always go wrong and get an arithmetic or null exception that was not handled and thrown due to a bug in regex library code. – Athanasios Kataras Mar 20 '16 at 10:30
  • @AthanasiosKataras: You may be confusing C++ and another language. There are no "null pointer exceptions" in C++. And the only arithmetic exceptions are overflow and underflow, which really make no sense here. – MSalters Mar 22 '16 at 11:20
  • Quite right, I was thinking in terms of C#. There is no point in catching nulls in C++. It was just a comment on generic run-time exceptions that are not explicitly thrown by the code. – Athanasios Kataras Mar 22 '16 at 12:19
0

See pp735-6 of Josuttis' "The C++ Standard Library" 2nd Edition. Here's a list of exceptions, each with a text explanation on the next two lines

std::regex_constants::error_collate:
"error_collate: "
"regex has invalid collating element name";
std::regex_constants::error_ctype:
"error_ctype: "
"regex has invalid character class name";
std::regex_constants::error_escape:
"error_escape: "
"regex has invalid escaped char. or trailing escape";
std::regex_constants::error_backref:
"error_backref: "
"regex has invalid back reference";
std::regex_constants::error_brack:
"error_brack: "
"regex has mismatched ’[’ and ’]’";
std::regex_constants::error_paren:
"error_paren: "
"regex has mismatched ’(’ and ’)’";
std::regex_constants::error_brace:
"error_brace: "
"regex has mismatched ’{’ and ’}’";
std::regex_constants::error_badbrace:
"error_badbrace: "
"regex has invalid range in {} expression";
std::regex_constants::error_range:
"error_range: "
"regex has invalid character range, such as ’[b-a]’";
std::regex_constants::error_space:
"error_space: "
"insufficient memory to convert regex into finite state";
std::regex_constants::error_badrepeat:
"error_badrepeat: "
"one of *?+{ not preceded by valid regex";
std::regex_constants::error_complexity:
"error_complexity: "
"complexity of match against regex over pre-set level";
std::regex_constants::error_stack:
"error_stack: "
"insufficient memory to determine regex match";
rsjaffe
  • 5,600
  • 7
  • 27
  • 39
  • 2
    Technically these aren't exceptions, but constants. Poor OO design here; having to do a `switch` inside a `catch` is quite unfortunate. – MSalters Mar 22 '16 at 11:24