Limit recursion on C++ target

Question

While fuzzing a language made with antlr, the fuzzer reported a slow testcase that was using quite a lot of parens.

One of the rules in the grammar is somewhat like:

paren_expression: '(' expression ')';

Even if it was reported as a slow unit, it underlies the bigger problem of being able to somewhat easily crash the application with enough parens used (and it does on windows which has smaller stack size by default).

From what I searched, there's no option to generate code that checks the stack depth and exits after a reasonable depth, and recovering from stack overflow in C++ is not really a good or portable thing to do.

So, what can be done in this case? Crashing from bad input is not very nice.

Maybe have a global counter, that is incremented, when some functions are entered and decremented, when those leave. And when the counter is at like 10000, output a text and exit. — JCWasmx86, Jul 29 '20 at 17:21
Sure, that would be great, but I have no power to do that. The antlr compiler should do that. — Andrei Damian, Jul 29 '20 at 17:23

score 3 · Accepted Answer · answered Jul 29 '20 at 17:48

3

You could add a predicate that checks how deep the nested expression is, and let the predicate fail if it exceeds a certain number.

For example, you allow a maximum of 3 nested expressions, you could do that like this:

grammar T;

@members {
  private int depth = 0;
}

parse
 : expr EOF
 ;

expr
 : '(' expr ')' {++depth <= 3}?
 | INT
 ;

INT
 : [0-9]+
 ;

The code:

TLexer lexer = new TLexer(CharStreams.fromString("(((42)))"));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.parse();

will parse fine, but the code:

TLexer lexer = new TLexer(CharStreams.fromString("((((42))))"));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.parse();

will throw an exception.

The parts inside the predicate ({...}?) and inside the @members block are target specific code (Java, in this case). You'll have to write that in C++.

answered Jul 29 '20 at 17:48

Bart Kiers

166,582
36
299
288

1

That definitely is better than nothing, but I was hoping for something more general, eg. limit the recursivity for everything antlr related. For the way you suggest I'd have to repeat the same check on every recursive rule. Still better than nothing. Thanks. – Andrei Damian Jul 29 '20 at 18:18
1

AFAIK, there is no setting (or flag/parameter) for this. – Bart Kiers Jul 29 '20 at 18:21
1

Looks like this is the best option to solve this for now. Thanks. – Andrei Damian Jul 31 '20 at 19:43

score 0 · Answer 2 · answered Jul 30 '20 at 07:43

0

This is not a bug, but a feature. Recursive decent parsers have no built-in measure to prevent a stack overflow. It could be you really want to parse deeply nested structures and you are able to throw enough memory at that problem. Why should the parser generator set an arbitrary recursion depth for you, if it doesn't know the problem you are trying to solve?

Use the means of your OS to set a thread's stack depth and you will be able to parse deeply nested expressions. I have a test case with > 700 nested parentheses and that parses fine:

This output is from a macOS test app. In the XCode project I set the stack limit in the "Other Linker Flags" entry to: -Wl,-stack_size,0x10000000. Similar settings are available on other platforms/build tools. However, this is a setting for the main thread of your app. For secondary threads things get tricky, because C++ threads don't allow to specify a stack size during creation.

answered Jul 30 '20 at 07:43

Mike Lischke

48,925
16
119
181

Sure, being able to support deeply nested structures is a good thing. The parser generator should definitely not set an arbitrary limit as it has no knowledge of what I'll be doing with it. But crashing, on any input, is not a reasonable thing to do. Increasing the stack size is just a temporary workaround. So maybe the parser generator should have an option where the user specifies the max depth, or provide other way to fix this. – Andrei Damian Jul 31 '20 at 19:37
It's not only a bug, it's a serious bug. If ANTL used dynamically-allocated memory (heap), then indeed, one could argue that it should be able to consume as much memory as available. But ANTLR uses recursion on the **stack**. In my application it causes a crash on input as short as "((((" with a few thousand parentheses. An unavoidable crash that the application can't avoid, which creates the possibility of a DoS attack on the application. – Nadav Har'El Jul 09 '23 at 10:14

Limit recursion on C++ target

2 Answers2