My ANTLR4 based C++ parser dies on out of memory error when I try to parse expanded header

Question

Hi ANTLR experts out there, I need your help!

I have been using ANTLR for quite a while and wrote several parsers using it ANTLR4. The version I use now is 4.5.3.

Ok I am writing a c++ parser with Javascript target. Until this time things have been quite smooth. Then I tried to parse iostream.h (actually with a small main and used Xcode llvm to preprocess). It is more than 30000 lines of code of 1.2MB in size. There are errors but those can be fixed. The problem is I get this out of memory error.



  375416 ms: Mark-sweep 1349.2 (1422.3) -> 1349.0 (1432.3) MB, 719.3 / 0 ms [allocation failure] [GC in old space requested].
  376141 ms: Mark-sweep 1349.0 (1432.3) -> 1349.0 (1433.3) MB, 725.6 / 0 ms [allocation failure] [GC in old space requested].
  376900 ms: Mark-sweep 1349.0 (1433.3) -> 1349.0 (1433.3) MB, 758.7 / 0 ms [last resort gc].
  377632 ms: Mark-sweep 1349.0 (1433.3) -> 1349.0 (1433.3) MB, 731.9 / 0 ms [last resort gc].


==== JS stack trace =========================================

Security context: 0x973d6c9e59 
    1: Join(aka Join) [native array.js:179] [pc=0x109304f3a6b5] (this=0x973d604189 ,w=0x109e7ca91511 ,x=108,N=0x1a20734ed919 ,M=0x973d6b4a11 )
    2: InnerArrayJoin(aka InnerArrayJoin) [native array.js:~343] [pc=0x109304f1afac] (this=0x973d604189 ,N=0x1a20734ed919 ,w=0...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
Abort trap: 6

My main now is like this

  try {
    text = fs.readFileSync(filepath, 'utf8');

    var chars = new antlr4.InputStream(text);
    var lexer = new CPP14Lexer.CPP14Lexer(chars);
    var tokens  = new antlr4.CommonTokenStream(lexer);
    var parser = new CPP14Parser.CPP14Parser(tokens);
    var printer;

    try {
    // SLL mode
    parser.buildParseTrees = true;
    parser.setTrace(this.trace);
    var cache = new PredictionContextCache();
    var sim = new ParserATNSimulator(parser, parser._interp.atn, parser._interp.decisionToDFA, cache);
    sim.parser._interp.predictionMode = PredictionMode.SLL;

    var tree = sim.parser.translationunit();
    printer = new listener.CPPXXPrinter(tokens, lexer, parser, this.debug);

    antlr4.tree.ParseTreeWalker.DEFAULT.walk(printer, tree);
 }
 catch ...

It tries to build a syntax tree for the entire input. My question is, is there any ways to do things in memory saving mode, for example not creating the tree at all and just parse the input?

Thanks -Yoshi

Why would just a parse producing YES/NO be useful in practice? — Ira Baxter, Sep 26 '16 at 11:28
Why aren't you having trouble with preprocessor directives and nasty C++ syntax? The preprocessor is hard to get right, and C++ syntax generally requires arbitrarily-far parser lookahead to resolve some syntax, and in other cases produces ambiguous parses (consider **T * X;**) which ANTLR doesn't handle well. See http://stackoverflow.com/a/1004737/120163 — Ira Baxter, Sep 26 '16 at 11:41
Your main problem might be that C++ requires template definitions in headers, and `iostream.h` sounds like the header which contains a full implementation of ``. It might be worthwhile to investigate if you can simply ignore that specific header. Similarly, you might consider specific optimizations for ``, another common and non-trivial header. — MSalters, Sep 26 '16 at 14:03
Thanks Baxter and MSalters for the comment. Re "Why aren't you having trouble with preprocessor directives and nasty C++ syntax? ", it is because I am parsing the preprocessed code. There is a way to preprocess only with Xcode. So the code does not contain preprocessor directives. It has a bunch of template defs, yes. — yoshi, Sep 27 '16 at 06:03
I was wondering if there was a different approach to code main - with less memory usage. If there is no such methods, I will have to think of other approaches such as not parsing expanded headers at all. JS runtime is pretty slow and parsing 30000 lines of code could be a bit too much actually. — yoshi, Sep 27 '16 at 06:21

My ANTLR4 based C++ parser dies on out of memory error when I try to parse expanded header

0 Answers0