Are there any security concerns if I run a user defined regular expression on my server with a user defined input string? I'm not asking about a single language, but any language really, with PHP as one of the main language I would like to know about.
For example, if I have the code below:
<?php
if(isset($_POST['regex'])) {
preg_match($_POST['regex'], $_POST['match'], $matches);
var_dump($matches);
}
?>
<form action="" method="post">
<input type="text" name="regex">
<textarea name="match"></textarea>
<input type="submit">
</form>
Providing this is not a controlled environment (i.e. the user can't be trusted), what are the risks of the above code? If a similar code is written for other languages, are there risks in these other languages? If so, which languages consist of threats?
I already found out about 'evil regular expressions', however, no matter what I try on my computer, they seem to work fine, see below.
PHP
<?php
php > preg_match('/^((ab)*)+$/', 'ababab', $matches);var_dump($matches);
array(3) {
[0] =>
string(6) "ababab"
[1] =>
string(0) ""
[2] =>
string(2) "ab"
}
php > preg_match('/^((ab)*)+$/', 'abababa', $matches);var_dump($matches);
array(0) {
}
JavaScript
phantomjs> /^((ab)*)+$/g.exec('ababab');
{
"0": "ababab",
"1": "ababab",
"2": "ab",
"index": 0,
"input": "ababab"
}
phantomjs> /^((ab)*)+$/g.exec('abababa');
null
This leads me to believe that PHP and JavaScript have a fail-safe mechanism for evil regexes. Based on that, I would have that other languages have similar features.
Is this a correct assumption?
Finally, for any or all of the languages that may be harmful, are there any ways to make sure the regular expressions doesn't cause damage?