Stack overflow parsing very large expressions in ANTLR4

Question

I am working reimplementing an existing DSL in ANTLR4. The existing body of source has some VERY large expressions. It appears that recursion in the ALL(*) logic means that there is a limit on how large an expression I can parse.

Sample grammar: (just enough to reproduce the error here error)

  grammar A4Test;

  fragment DIGIT : [0-9];

  fragment ALPHA : [a-zA-Z];


  WS  :   [ \t\r\n\u000D'] {skip();};
    
  ID  :   ALPHA (ALPHA|DIGIT)*;
        
  NUMBER : '-'?(DIGIT+|(DIGIT*'.'DIGIT+));
     
  e : expr;
          
  expr : '(' expr ')'
    |   expr 'OR' expr
    |   expr 'AND' expr
    |   ID
    |   NUMBER
    ;

Sample Input:

V0 AND 0 OR
V1 AND 1 OR
...  (MANY rows elided)
V3999 AND 3999 OR
V4000 AND 4000

Stack Trace:

Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.antlr.v4.runtime.misc.TestRig.process(TestRig.java:249)
    at org.antlr.v4.runtime.misc.TestRig.process(TestRig.java:211)
    at org.antlr.v4.runtime.misc.TestRig.main(TestRig.java:143)
Caused by: java.lang.StackOverflowError
    at java.util.Arrays.equals(Arrays.java:1869)
    at org.antlr.v4.runtime.atn.ArrayPredictionContext.equals(ArrayPredictionContext.java:101)
    at java.util.HashMap.getEntry(HashMap.java:471)
    at java.util.LinkedHashMap.get(LinkedHashMap.java:301)
    at org.antlr.v4.runtime.misc.DoubleKeyMap.get(DoubleKeyMap.java:62)
    at org.antlr.v4.runtime.atn.PredictionContext.mergeArrays(PredictionContext.java:418)
    at org.antlr.v4.runtime.atn.PredictionContext.merge(PredictionContext.java:199)
    at org.antlr.v4.runtime.atn.ATNConfigSet.add(ATNConfigSet.java:175)
    at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1126)
    at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1111)
    at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1164)
    at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1111)
    at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1164)
    at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1111)
    at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1164)
    at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1111)
    at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1164)
    at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1111)
    at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1164)

...

Limiting the size of expressions is not an option. They compile fine with the current technology, so we'll have to support it.

Will I have to factor out the left recursion on this to avoid the extremely high stack utilization? Or, is there a simpler answer?

You could try a larger JVM stack: http://stackoverflow.com/questions/3700459/how-to-increase-to-java-stack-size — FreeJack, Jan 17 '14 at 19:17
I have done that and it makes a difference, but in the end just pushes the breaking point out a bit, so I struggle to call it a solution. — Mike Cargal, Jan 17 '14 at 19:49

score 0 · Accepted Answer · answered Jan 17 '14 at 20:27

0

ANTLR 4.2 will improve this situation by incorporating pull request #401. Since it's not yet released, I recommend building the latest release of ANTLR 4 from source and trying your input again.

answered Jan 17 '14 at 20:27

Sam Harwell

97,721
20
209
280

Looks promising. I'll take a look. I already did the left recursion removal and, in a simple test, was shocked by the performance improvement, so I expected to just keep the version without the left recursion. Will your change result in performance anywhere near as fast as the results I got by removing the left recursion? I'd prefer to go back to the simpler syntax (and simpler AST) if the performance is there. – Mike Cargal Jan 17 '14 at 20:42
I've built the latest from source on GitHub. It does appear that this likely prevents (or certainly improves) the situation with the stack overflow. I'm compiling the single expression now and the java memory utilization is gradually increasing. That said, it looks like I may be creating the compulsive case that gives the O(n4) (or near that) performance. The grammar with the left recursion factored out runs very quickly (perhaps sub-second; I haven't timed it); however, running with the left recursion grammar has now been running for over 6 hours, and has still not completed. – Mike Cargal Jan 18 '14 at 21:18
@MikeCargal Can you send me your grammar and sample input that demonstrates the problem? I'd like to run some tests on it. Also, the only O(n⁴) case we've been able to create involved a very obviously poorly structured ambiguous rule. The worst we've seen for unambiguous grammars is O(n³). – Sam Harwell Jan 18 '14 at 22:15
Using your example grammar and input with the latest work on ANTLR 4.2, I'm finding that even inputs 400 times the size of the one you listed in the first post complete in 15 seconds. – Sam Harwell Jan 18 '14 at 23:46
Thanks for the help. I'm not sure what I had done wrong. After working my way back through the build process again, I was able to run my test again and it completes in about a second. Obviously, I had not built something correctly, or had not pointed to my build results correctly. Works like a charm now. – Mike Cargal Jan 19 '14 at 02:56

Stack overflow parsing very large expressions in ANTLR4

1 Answers1