1

I want users to be able to provide relatively simple patterns for matching (for now) different kinds of IDs. I need to evaluate those patterns server side, potentially against a big number of short strings.

The obvious solution is to user regular expressions, but that seems somewhat dangerous, compare https://stackoverflow.com/a/4289936/1239832. Catastrophic backtracking can use a lot of CPU and RAM, which amounts to a DoS vector.

I would be content with something a lot simpler than full PCRE, perhaps something like "posessive" regular expressions with no backtracking. Are there libraries that can do this? PHP would be ideal, but I'd be willing to port, if the code is not too complex.

Community
  • 1
  • 1
brightbyte
  • 971
  • 4
  • 10
  • What kind of patterns? Wouldn't glob-style patterns do? Why not mask submission and evaluation of potentially expensive patterns with a captcha? (The now deprecated `ereg` functions typically could've avoided the majority of backtracking and memory depletion cases). – mario Jul 06 '15 at 23:10
  • you can use "full" pcre and avoid backtracking using atomic grouping, possessive operators and named subpatterns.. – bro Jul 07 '15 at 06:27
  • @mario: Glob with character classes would be a good start, but I'd want repetition counts too. A typical pattern would be "five digits, then a dash, tehn three digits, then once of the letters A, B, or X". Stuff like that. Also, any special handling of / or . would be annoying. – brightbyte Jul 07 '15 at 17:09
  • @bro yes, if I control the regex. But I don't. The expression is user-supplied. – brightbyte Jul 07 '15 at 17:10
  • @brightbyte. ah ok..now i got it. did you consider setting a lower backtracking limit in pcre? http://php.net/manual/en/pcre.configuration.php this should allow you to use the full pcre capabilities while limiting the load on your machine – bro Jul 08 '15 at 06:44
  • @bro PCRE limits may be an option, thanks! Do you know of any analysis of how safe these limits are? And then there is the problem with user fedback. Runtime limits don't give me a way to reject "bad" expressions immediaqtely. Instead, some expressions will fail sometimes, with no good way to tell the expression's autor. Is there even a good way to detect whether the limit was exceeded by looking at the return value of preg_match? Silent failure would lead to false negatives. – brightbyte Jul 08 '15 at 07:56
  • @brightbyte: I normally use the default setting, so I don't have much experience. Mod_security for example uses a very low setting (2000). Maybe you can come up with some extreme cases for your search and then tweak the upper bound accordingly. I agree with you about the user feedback. Internally PCRE would throw an "PCRE_ERROR_MATCHLIMIT" error. But I think in `preg_match` it will just return FALSE. – bro Jul 08 '15 at 08:55

0 Answers0