2

While fuzzing a language made with antlr, the fuzzer reported a slow testcase that was using quite a lot of parens.

One of the rules in the grammar is somewhat like:

paren_expression: '(' expression ')';

Even if it was reported as a slow unit, it underlies the bigger problem of being able to somewhat easily crash the application with enough parens used (and it does on windows which has smaller stack size by default).

From what I searched, there's no option to generate code that checks the stack depth and exits after a reasonable depth, and recovering from stack overflow in C++ is not really a good or portable thing to do.

So, what can be done in this case? Crashing from bad input is not very nice.

Andrei Damian
  • 396
  • 2
  • 6
  • 16

2 Answers2

3

You could add a predicate that checks how deep the nested expression is, and let the predicate fail if it exceeds a certain number.

For example, you allow a maximum of 3 nested expressions, you could do that like this:

grammar T;

@members {
  private int depth = 0;
}

parse
 : expr EOF
 ;

expr
 : '(' expr ')' {++depth <= 3}?
 | INT
 ;

INT
 : [0-9]+
 ;

The code:

TLexer lexer = new TLexer(CharStreams.fromString("(((42)))"));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.parse();

will parse fine, but the code:

TLexer lexer = new TLexer(CharStreams.fromString("((((42))))"));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.parse();

will throw an exception.

The parts inside the predicate ({...}?) and inside the @members block are target specific code (Java, in this case). You'll have to write that in C++.

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • 1
    That definitely is better than nothing, but I was hoping for something more general, eg. limit the recursivity for everything antlr related. For the way you suggest I'd have to repeat the same check on every recursive rule. Still better than nothing. Thanks. – Andrei Damian Jul 29 '20 at 18:18
  • 1
    AFAIK, there is no setting (or flag/parameter) for this. – Bart Kiers Jul 29 '20 at 18:21
  • 1
    Looks like this is the best option to solve this for now. Thanks. – Andrei Damian Jul 31 '20 at 19:43
0

This is not a bug, but a feature. Recursive decent parsers have no built-in measure to prevent a stack overflow. It could be you really want to parse deeply nested structures and you are able to throw enough memory at that problem. Why should the parser generator set an arbitrary recursion depth for you, if it doesn't know the problem you are trying to solve?

Use the means of your OS to set a thread's stack depth and you will be able to parse deeply nested expressions. I have a test case with > 700 nested parentheses and that parses fine:

enter image description here

This output is from a macOS test app. In the XCode project I set the stack limit in the "Other Linker Flags" entry to: -Wl,-stack_size,0x10000000. Similar settings are available on other platforms/build tools. However, this is a setting for the main thread of your app. For secondary threads things get tricky, because C++ threads don't allow to specify a stack size during creation.

Mike Lischke
  • 48,925
  • 16
  • 119
  • 181
  • Sure, being able to support deeply nested structures is a good thing. The parser generator should definitely not set an arbitrary limit as it has no knowledge of what I'll be doing with it. But crashing, on any input, is not a reasonable thing to do. Increasing the stack size is just a temporary workaround. So maybe the parser generator should have an option where the user specifies the max depth, or provide other way to fix this. – Andrei Damian Jul 31 '20 at 19:37
  • It's not only a bug, it's a serious bug. If ANTL used dynamically-allocated memory (heap), then indeed, one could argue that it should be able to consume as much memory as available. But ANTLR uses recursion on the **stack**. In my application it causes a crash on input as short as "((((" with a few thousand parentheses. An unavoidable crash that the application can't avoid, which creates the possibility of a DoS attack on the application. – Nadav Har'El Jul 09 '23 at 10:14