5

Is it considered safe to use user-defined expressions with std::regex (eg for a server-side search)? Does the standard library make any guarantees about the safety of broken expressions?

Jarod42
  • 203,559
  • 14
  • 181
  • 302
  • 4
    What do you mean by "safe"? Safe against remote code execution, denial-of-service, exceptions? – rustyx Feb 06 '19 at 15:23

3 Answers3

4

Basically No. It is not safe. Perfectly legal regular expressions can be crafted that take extraordinarily long times to compute - causing denial of service.

From Wikipedia under ReDoS

The regular expression denial of service (ReDoS)[1] is an algorithmic complexity attack that produces a denial-of-service by providing a regular expression that takes a very long time to evaluate. The attack exploits the fact that most regular expression implementations have exponential time worst case complexity: the time taken can grow exponentially in relation to input size. An attacker can thus cause a program to spend an unbounded amount of time processing by providing such a regular expression, either slowing down or becoming unresponsive.

Galik
  • 47,303
  • 4
  • 80
  • 117
3

The standard requires that an implementation throws an error when the passed regex is invalid.

[regex.construct-3]:

explicit basic_regex(const charT* p, flag_type f = regex_constants::ECMAScript);

Requires: p shall not be a null pointer.

Throws: regex_­error if p is not a valid regular expression.

Effects: Constructs an object of class basic_­regex; the object's internal finite state machine is constructed from the regular expression contained in the array of charT of length char_­traits<charT>::​length(p) whose first element is designated by p, and interpreted according to the flags f.

Ensures: flags() returns f. mark_­count() returns the number of marked sub-expressions within the expression.

There is even a table detailing the different kinds of errors possible.

So as long as you do not pass a null pointer, there should be no undefined behavior in creating a regex from a user-provided string.

Note that any practical implementation may of course still have bugs that may lead to security vulnerabilities. The standard also obviously doesn't guarantee that a malicious user has no way to DoS your system by submitting a very complex/self-referential regex that produces too many matches, uses too much memory/CPU etc., so you'll have to consider that yourself. But if you are just worried whether an invalid regex is free to lead to UB, the answer is "no, you're fine".

Max Langhof
  • 23,383
  • 5
  • 39
  • 72
0

The C++ standard defines what correct behavior means. Even in cases of functions that throw exceptions, the standard defines which functions throw, what exceptions will be thrown, and what circumstances will cause such exceptions to be thrown. Such code has behavior that is well-defined by the standard.

The standard does not, and can not, specify what happens if an implementation is operating against the behavior defined by the standard (ie: is "broken"). If the standard were to define such behavior, then implementations would not be operating against the standard, by definition. So they would no longer be "broken".

So, whether regex implementations are able to avoid pathological behavior caused by externally-provided strings which you did not sanitize is a matter of quality-of-implementation, not standard-defined behavior.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982