17

I want to add a feature to my website to let users search the texts with RegEx. But, is it safe to let the users do something like that ?

preg_match('/' . $user_input_regex . '/', $subject);
Cole Tobin
  • 9,206
  • 15
  • 49
  • 74
Sdgsdg Asgasgf
  • 205
  • 2
  • 6
  • 3
    You'd probably need to escape it using [preg_quote()](http://www.php.net/manual/en/function.preg-quote.php), and it isn't easy to trap errors gracefully if the user input is a malformed regexp – Mark Baker Jul 19 '14 at 14:02
  • 3
    @MarkBaker But if i escape the RegEx characters with preg_quote() the RegEx will not work – Sdgsdg Asgasgf Jul 19 '14 at 14:05
  • Personally, that code looks fine to me. I'm no expert, but that command can't edit anything... the worst they can do is give you messed up results. However, it's a different story if you're running RegExes from different users. – Anonymous Penguin Jul 19 '14 at 14:09
  • 1
    @AnnonomusPenguin The user provided "regex2 could contain php code. – arkascha Jul 19 '14 at 14:09
  • 1
    Escaping the user regex with preg_quote in this case is counterproductive to the op idea because all regex signs will be escaped. This makes the user regex useless because it will be handled as a normal string -> no regex matching will occure. – Ota Jul 19 '14 at 14:15
  • @SdgsdgAsgasgf - [**This section on preventing SQL injection**](https://stackoverflow.com/questions/60174/how-can-i-prevent-sql-injection-in-php) on SO, may contain useful information you can use, even though most of it is based on DB-related issues. – Funk Forty Niner Jul 19 '14 at 14:18
  • don't know about php, but java can go recursive using regex – Leo Jul 19 '14 at 22:56
  • You could go with a client-side regex search if possible. – Jwosty Jul 19 '14 at 23:54
  • 1
    @MarkBaker There is a `filter_var` (and `filter_input`) filter that validates regex: http://nz2.php.net/manual/en/filter.filters.validate.php – svbnet Jul 25 '14 at 06:23

2 Answers2

12

There is a possible attack on this code called a ReDoS attack (Regular expression Denial of Service).

The Regular expression Denial of Service (ReDoS) is a Denial of Service attack, that exploits the fact that most Regular Expression implementations may reach extreme situations that cause them to work very slowly (exponentially related to input size). An attacker can then cause a program using a Regular Expression to enter these extreme situations and then hang for a very long time.

Specifically with preg_match there is a known issue that can cause a PHP Segmentation Fault.

So the answer is no, it is not safe because of issues such as these.

SilverlightFox
  • 32,436
  • 11
  • 76
  • 145
  • 1. With ReDoS the attacker can only slow the server but can't do any damage other then that, right? – Sdgsdg Asgasgf Jul 19 '14 at 15:04
  • 2. PHP will not "go out" after the memory of the request exceeded the memory limit? What i mean is: can i prevent the problem by reducing the memory limit and not allowing too many searches at once? – Sdgsdg Asgasgf Jul 19 '14 at 15:06
  • 1
    @SdgsdgAsgasgf: 1) Other than making your server unresponsive to all requests, no. 2) There [may be a workaround for the segfault](http://php.net/manual/en/function.preg-match.php#88587). – SilverlightFox Jul 19 '14 at 15:35
  • 1
    Is it possible to limit the length of the input and maybe the "absurdity" so the effects of this attack are mitigated? There would still be a risk of a ReDDos, but no more than any other DDoS. – trysis Jul 19 '14 at 16:59
  • 3
    You can always use a regex implementation based on DFAs that do *not* have this kind of problem since they have a linear worst case instead of exponential (although they have *some* limitations). – Bakuriu Jul 19 '14 at 17:12
  • 2
    Why not give a time limit for a search? If the search takes >`x` seconds, then abort. – Justin Jul 19 '14 at 21:24
  • @Bakuriu The NFA→DFA compilation step can be very expensive; that's when you pay the complexity costs. Everything is good once you've done that, but with user-defined REs you're _still_ paying at least once (and possibly every time if your caching sucks). – Donal Fellows Jul 20 '14 at 15:25
  • @Quincunx So you'd put in lots of thread interrupt machinery to handle what is a rare case in practice? (The CPU will be busy during all that time, so simple in-thread timeouts such as those you'd use in I/O aren't so useful.) The _real_ way of dealing with these things is simply to not let the user specify arbitrary REs (unless they're the real owner of the process, in which case they can also own the core dump or pegged CPU…) – Donal Fellows Jul 20 '14 at 15:29
  • @DonalFellows Well, you can avoid the conversion altogether. See [this](http://swtch.com/~rsc/regexp/regexp1.html) series of articles. The time for matching grows to O(n^2) (being O(n) or O(nlog(n)) on the average case I believe) which is *much much* better than O(2^n) of backtracking regexes. Sure `n^2` may still allow for some DoS attacks, but the number of requests must be a lot higher than using "standard" regex engines. – Bakuriu Jul 20 '14 at 15:40
4

Security wise you should never trust user input, so it depends what you do with the input. In your given case you should at least escape the used delimiter (backslash) in the user input to ensure the regex works.

Ota
  • 696
  • 3
  • 11