0

Possible Duplicate:
Sanitization of User-Supplied Regular Expressions in PHP

Let's say you want to let users search for something and your search function has the ability to accept regular expressions.

Is it OK to let site users to search by regexes that they post? From a user's point of view, I'd love a site which would let me do that :D

Is there any security risk involved? How can I sanitize a regex?

Community
  • 1
  • 1
Alex
  • 66,732
  • 177
  • 439
  • 641
  • I think allowing users to search using wildcards (* and ?) is a better idea, it's easier to implement and it would be more resource friendly. – fardjad Oct 17 '12 at 10:46
  • there's dangers with wildcards too though, they can consume your resources as well if you're not careful. But I agree on the reasoning. – eis Oct 17 '12 at 11:17

3 Answers3

2

The main risk is that the regular expression is very complex and will run for ages or reach the recursion limit of the engine. See this article. Other risks may occur if you let your users user regex replacement in the wrong places, because that introduces the risk of code injection. But matching itself cannot really do any other harm than DoSing your server.

There has been a question recently on how to recognize these dangerous regexes and the consensus was that it is not generally possible. See the question.

You are probably best off by restricting the time your regex search can take and abort it if it takes too long.

Community
  • 1
  • 1
Martin Ender
  • 43,427
  • 11
  • 90
  • 130
2

I don't see a direct security risk, but I see performance related issues that can easily cause some serious downtime. There's two flavors of this, too complex ones and too broad ones. Consider for example a query like .* - with a big database, I've seen that even a couple of those can easily bring down systems.

I would execute user searches with something else than the actual live database, preferrably from cached results in memory, where this should not matter as much.

Or just implement only wildcards like suggested in the comments (*,?). They're both more user friendly and easier to deal with.

eis
  • 51,991
  • 13
  • 150
  • 199
1

If the regex doesn't effect the programming code, there's no real security risk. The reason, I believe, that it's often not implemented is that it is a costly procedure and I have never seen it used in SQL, so you would need to get ALL the content being searched through, and then run the regex on it, rather than the simplicity allowed with the SQL like or exact matching, etc.

Jon
  • 4,746
  • 2
  • 24
  • 37
  • um, a lot of databases have a like operator that allows for regular expressions too, using a separate keyword. If I understood correctly you've never seen a thing like that? – eis Oct 17 '12 at 10:44
  • well, `regexp` in MySQL, but it doesn't, from my knowledge, allow the full power of regex. – Jon Oct 17 '12 at 10:51
  • there's also [oracle regexp](http://psoug.org/reference/regexp.html) and [postgresql regular expressions](http://www.postgresql.org/docs/8.3/static/functions-matching.html) and [firebird regular expressions](http://www.firebirdsql.org/refdocs/langrefupd25-similar-to.html)... the only one I know that doesn't have 'em is MSSQL. – eis Oct 17 '12 at 11:00