3

I'm trying to use ANTLR for parsing C++ source code, using the ANTLR C++ grammar file.

After generating the lexer, parser and listeners (CPP14BaseListener.java, CPP14Lexer.java, CPP14Listener.java, CPP14Parser.java), trying to run it on a C++ file in this way:

private void parseCppFile(String file) throws IOException {
    String p1 = readFile(new File(file), Charset.forName("UTF-8"));
    System.out.println(p1);
    // Get our lexer
    CPP14Lexer lexer = new CPP14Lexer(new ANTLRInputStream(p1));
    // Get a list of matched tokens
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    // Pass the tokens to the parser
    CPP14Parser parser = new CPP14Parser(tokens);
    // Walk it and attach our listener
    ParseTreeWalker walker = new ParseTreeWalker();
    // Specify our entry point
    ParseTree entryPoint = null;//TODO: what is the entry point?
    walker.walk(new CPP14BaseListener(), entryPoint);
}

My question is - which of the CPP14Parser generated methods to use for getting the entry point of parsing the file? (see TODO comment).

Alternatively, any pointer for a working example showing how to parse a C++ source file, would be great.

Thanks!

Roy
  • 139
  • 3
  • 11
  • C++ has ambiguous syntax. Trying to parse it with a pure grammar (with no outside ad hoc help for disambiguation) will fail. The grammar being used here does not appear to have any such outside help. It may be possible to patch it up (after all Clang an GCC manage to parse C++ with just recursive descent) but the effort to do so is likely to be a lot bigger than you think. And then you'll run into preprocessor code. For more details, see https://stackoverflow.com/questions/243383/why-cant-c-be-parsed-with-a-lr1-parser/1004737#1004737 – Ira Baxter Sep 20 '17 at 10:51
  • Hello, I try to do something similar, can you tell me what lib do you use, and how you include -it in your pom? – sab Oct 03 '17 at 06:59

1 Answers1

2

The entry point of a grammar is usually the rule that ends with EOF. In you case, try the translationunit rule:

ParseTree entryPoint = parser.translationunit();

In case people don't read the comments, I'll add Mike's noteworthy comment to my answer:

... and if that is not the case (ending n EOF) chances are the first parser rule in a grammar is the entry point (especially if it is not called from anywhere). On the other hand in one of my grammars I defined half a dozen other rules which end with EOF (mostly to parse sub elements of my language). Sometimes it's tricky... :-)

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • 1
    ... and if that is not the case (ending n EOF) chances are the first parser rule in a grammar is the entry point (especially if it is not called from anywhere). On the other hand in one of my grammars I defined half a dozen other rules which end with EOF (mostly to parse sub elements of my language). Sometimes it's tricky... :-) – Mike Lischke Sep 20 '17 at 07:03