13

Edit: tchrist has informed me that my original accusations about Perl's insecurity are unfounded. However, the question still stands.

I know that in Perl, you can embed arbitrary code in a regular expression, so obviously accepting a user-supplied regex and matching it allows arbitrary code execution and is a clear security hole. But is this true for all languages that use regular expressions? Is it true for all languages that use "Perl-compatible" regular expressions? In which languages are user-supplied regexes safe to use, and in which languages do they allow arbitrary code execution or other security holes?

Ryan C. Thompson
  • 40,856
  • 28
  • 97
  • 159
  • 3
    I suspect the most practical answer is going to be "Don't do that." – Ryan C. Thompson Nov 27 '10 at 03:05
  • 1
    I'm pretty sure it could be used as a DoS attack in most languages, I remember reading something about how nested *'s can make regex matching REALLY slow – Bwmat Nov 27 '10 at 03:08
  • "Perl-compatible regular expressions" is kind of a weird phrase. Since Perl can be embedded in them, they're not actually regular expressions (I think the Perl docs call them "patterns" or "matchers" or something), and in order to be truly compatible you need all of Perl. :-) – Ken Nov 27 '10 at 03:15
  • All of them. A security hole just means that a possibility for exploitation exists, and it does, regardless of protections that may be included by individual languages. – Cody Gray - on strike Nov 27 '10 at 03:17
  • @Ken, nothing that has backreferences in it is a REGULAR regular expression. Nobody uses those anymore. – tchrist Nov 27 '10 at 03:41
  • tchrist: I've written far more 'regular' regexes than I have regexes requiring Perl compatibility. I think "PC" is the weirder half of "PCRE". :-) – Ken Nov 27 '10 at 04:06
  • tchrist: Regular languages can have backreferences; the backreferences just can't have Kleene closures in them. – Gabe Nov 27 '10 at 04:10
  • @Gabe: But backreferences require extra memory proportionate to the input size length, and there's no way to do that with a DFA. How can a backreffed language still be regular? Of course, these days “backrefs” are a bit of a misnomer, since under some circumstances there capture groups can also be forward refs. Weird world. – tchrist Nov 27 '10 at 13:42
  • 2
    tchrist: Backrefs may exponentially multiply the number of states in your DFA, but without closures they will still be finite, thus keeping your language regular. For example, `/([ab]).*\1/` is the same language as `/(a.*a)|(b.*b)/` so the backref is just syntactic sugar. However `/([ab]+).*\1/` cannot be written without backrefs so it is not regular. – Gabe Nov 27 '10 at 15:28

8 Answers8

18

In most languages allowing users to supply regular expression means that you allow for a denial of service attack.

Some types of regular expressions are extremely cpu intensive to execute. So in general it's a bad idea to allow users to enter regular expressions that will be executed on a remote system.

For more info, read this page: http://www.regular-expressions.info/catastrophic.html

Wolph
  • 78,177
  • 11
  • 137
  • 148
  • 2
    I can't agree with the "in general it's a bad idea to allow users to enter regular expressions" part. Maybe that's true for a web app but I don't think it's a general rule that applies to all software. Where would we be as programmers if our tools didn't allow us to use regular expressions for searching? – Bryan Oakley Nov 27 '10 at 03:34
  • 2
    @Bryan Oakley: How would the security be an issue at all when you're using the tools on your local computer? You're already executing a program, executing something from a regular expression hardly seems like a problem in this case. I believe that the question here was pointing towards remote execution, in which case both execution of code and DoS attacks are relevant. – Wolph Nov 27 '10 at 03:45
  • 1
    @Ken: if you sandbox it, both security issues won't be an issue anymore. – Wolph Nov 27 '10 at 03:47
  • How does one sandbox a regex? I've never heard of such a thing. – Gabe Nov 27 '10 at 03:48
  • @Gabe: there are multiple ways ofcourse, if sandboxing the specific regular expression is not possible than one could consider sandboxing the entire application. For example, instead of executing the regular expression in the main program run a sub program with the regular expression in a sandboxed environment. The maximum execution time can easily be fixed with a `kill` command after a specific time. The `eval` security issue is a little more complex however, that would require restricting all IO aswell. – Wolph Nov 27 '10 at 03:54
  • 1
    @Gabe, @WoLpH: You can sandbox Perl code using `Safe` compartments. These is a way to limit the opcodes available to the compiler. If you established a policy for a compartment that didn't allow an `eval` opcode, they couldn't compile code calling for that, because the compiler wouldn't have access to that opcode. It's a you-can't-get-there-from-here kind of thing. – tchrist Nov 27 '10 at 13:38
  • @WoLpH: my point exactly. There are many instances where security just isn't an issue, so this "general" rule isn't very general. The question mentioned nothing about remote execution. It just asked about languages without any context in which they are used. – Bryan Oakley Nov 27 '10 at 14:13
  • tchrist: Can you point to an example of sandboxing a regex in Perl? I'm curious as to whether it's really safe or it's like taint checking where it was just presumed safe. – Gabe Nov 27 '10 at 15:33
  • Gabe: I'm the furthest thing from a Perl expert, but I suspect it might be difficult/impossible there, short of running it in its own jail process, due to how they're a primitive part of the language. If the matching is done in a library, you could possibly just increment a counter in the backtracking/recursion step, and fail if the counter exceeds some predefined limit. – Ken Nov 27 '10 at 17:19
  • Ken: I'm willing to believe tchrist's advice with regards to Perl, but I'd still like to see it before offering that advice to somebody else. – Gabe Nov 28 '10 at 06:15
6

This is not true: you cannot execute code callbacks in Perl by sneaking them in an evaluated regex. This is forbidden. You have to specifically override that with a lexically scoped

use re "eval";

if you expect to have both interpolation and code escapes happening in the same pattern.

Watch:

% perl -le '$x = "(?{ die 'naughty' })"; "aaa" =~ /$x/'
Eval-group not allowed at runtime, use re 'eval' in regex m/(?{ die naughty })/ at -e line 1.
Exit 255

% perl -Mre=eval -le '$x = "(?{ die 'naughty' })"; "aaa" =~ /$x/'
naughty at (re_eval 1) line 1.
Exit 255
tchrist
  • 78,834
  • 30
  • 123
  • 180
  • Was it ever the default to allow code callbacks in regexes? I vaguely remember reading about this possibility back when I learned perl, almost a decade ago. But I can't remember whether it said "this is possible" or "this is on by default." – Ryan C. Thompson Nov 27 '10 at 06:32
  • @Ryan, it may have briefly been so. I seem to recall jumping up and down about the security matter, saying we couldn't release it with this problem, and that the `use re "eval"` pragma turned up as the fix. I don't know whether it was ever released insecurely. But that's over ten years back, and I'd have to review the p5p mail log to refresh my memory about it. – tchrist Nov 27 '10 at 13:35
  • Is there any stupid way that a programmer can use `qr//` to slip some user-supplied value into a regex such that eval will work? – Gabe Nov 27 '10 at 15:40
  • @Gabe: No. The regex compiler will not tolerate both interpolation and code escapes in the same pattern unless `use re "eval"` is active in the current lexical scope. Beyond that, there is also a difference between tainted and untained data to be observed where appropriate. – tchrist Nov 27 '10 at 17:54
2

It's generally dynamic languages with an eval facility that tend to have the ability to execute code from regular expressions. In static languages (i.e. those requiring a separate compilation step) there is generally no way to execute code that wasn't compiled, so evaluating code from within a regex is impossible.

Without a way to embed code in a regex, the worst a user can do is write a regex that takes a long time to evaluate.

Gabe
  • 84,912
  • 12
  • 139
  • 238
2

1)Vulnerabilities are found in regex libraries, such as this buffer overflow that affects Webkit and allows any attacker to gain remote code execution by accessing the regex library from javascript.

2)It is a DoS condition in C#.

3)User supplied regex's can be for php because of modifiers. Adding the /e modifier evals the match. In this case system will be eval()'ed.

preg_replace("/.*/e","system('echo /etc/passwd')");

Or in the form of a vulnerability:

preg_replace($_GET['regex'],$_GET['check']);

Community
  • 1
  • 1
rook
  • 66,304
  • 38
  • 162
  • 239
2

User-supplied regex, or in general, user input, should never be treated as safe - regardless of the programming language. If your program fails to do so, it is vulnerable to attacks by deliberately crafted inputs.

In the case of Regex, it can be ReDos: Regex Denial of Service. Basically, a regex which consumes an excessive amount of CPU and memory to process.

For e.g: if you try to evaluate this regex

^(([a-z])+.)+[A-Z]([a-z])+$

on this input:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!

you'll notice it may hang - it's called catastrophic backtrack. See it for yourself here: https://regex101.com/r/Qhn3Vb/1

Read more about Regex DoS: https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS


Bottomline: never assume user input is safe!

Dio Phung
  • 5,944
  • 5
  • 37
  • 55
1

Regular expressions are a programming language. I don't think they're quite Turing-complete, but they're close enough that allowing your users to enter them into your web site IS allowing other people to run code on your server. QED, yes, it's a security hole.

You might be able to get away with allowing a subset of whatever regexp language you want to use, whitelist a particular set of constructs to make it a not-big-enough-to-sweat-over hole... other people have already mentioned the possible dooms of nesting and * . How much you're willing to let people load down your server is up to you. Personally, I'd be comfortable with letting 'em have one SQL "CONTAINS" statement and maybe a "BETWEEN()". :)

mjfgates
  • 3,351
  • 1
  • 18
  • 15
1

I suspect ruby would allow /#{system("rm -rf really_important_directory")}/ - is that the kind of thing you're worried about?

Andrew Grimm
  • 78,473
  • 57
  • 200
  • 338
  • Yes, that's pretty much what I had in mind when I asked the question. I didn't even think of DoS until the answers started mentioning it. – Ryan C. Thompson Nov 27 '10 at 06:30
0

AFAIK, you can do it safely in C#: you can supply the regex string to the Regex constructor, and if it fails to parse it'll throw. I'm not sure about others.

Reinderien
  • 11,755
  • 5
  • 49
  • 77
  • 4
    Failing to parse is not a security issue. Parsing and then doing something malicious is. – Ryan C. Thompson Nov 27 '10 at 03:14
  • True. In cases like that it depends on what the regex is being used for. If it's as simple as matching a song title in a jukebox, I'd say it's safe. If it's matching an arbitrary filesystem path, maybe not. – Reinderien Nov 27 '10 at 03:22
  • In general it is safe to execute regexes in C#. – Gabe Nov 27 '10 at 03:43