2

I'm trying to use ANTLR to get a C++ AST, if possible from my C# code base.

Now, the basic workflow seems clear to me: Generate .cs lexer and parser using ANTLRWorks, add them and the ANTLR-references to a C# project, give it a C++ source, work with resulting data structures.

However, I'm already failing at the second step. I downloaded C++ grammars from http://www.antlr.org/grammar/list (I tried "C++ grammar " by Aurelian Melinte and "C++ grammar and code tracer for ANTLR 3.2" by Ramin Zaghi) and generated the lexer and parser for C# by setting "language = CSharp3;" in the grammar's options. However, I can't get to compile the C# project containing the parser and lexer files.

A problem is that I have no idea whether this is a problem of the grammar that I use or of the versions that are available... There are so many different versions of ANTLR, of the C# runtimes and of the C# Targets that attempting to try every combination seems to be a rather hopeless task.

However, the current combination seems to work fine, a small example grammar comes out with just one error ("HIDDEN" in the c# lexer needs to be changed to "Hidden" and that's it), but the C++ parser/lexer still gives me lots of compiler errors, mostly dealing with preprocessor directives and array declarations.

Did anyone ever manage to parse C++ with the ANTLR-generated C# files? Does anyone have any idea how this is supposed to work?

Jay
  • 237
  • 2
  • 14
  • 1
    How much of C++ do you need? Parsing C++98 in ANTLR was bad enough and things haven't exactly improved with C++11. (E.g. the handling of `>>` which now is much more intuitive) – MSalters Sep 21 '12 at 09:56

2 Answers2

3

The problem is that there is embedded code in both grammars, and that code is written in C++. Embedded code is very common in complex grammars, so you need to find a grammar for parsing C++ in C#, as opposed to just parsing C++. As a side note, if you are able to find one that parses C++ in Java, you can use IKVM to use it from C#.

erikkallen
  • 33,800
  • 13
  • 85
  • 120
  • 1
    Thanks for your answer, although it wasn't really what I was hoping for. Doesn't this embedded code kind of make this whole idea of seperating grammars and targets obsolete? – Jay Sep 21 '12 at 10:46
  • @Jay: You can actually build pure grammars for C++ and parse it (in the narrow sense of "check syntax" and build a parse tree) without gunking up the grammar with "embedded code". See http://stackoverflow.com/a/4173543/120163. As a practical matter, a pure parser isn't enough, google my essay on "life after parsing". At some point, one has to couple the grammar rules to some type of semantic analysis (at least to build symbol tables), and that "semantic analysis" is unlikely to be written in *your* chosen, favorite, convenient-to-you language (although Java, C#, and C++ guys all hope)... – Ira Baxter Jul 16 '14 at 10:09
  • @Jay: ... so either you have to give up getting a working parser in your convenient langauge, and/or you have accept the idea of invoking parsing machinery implemented in another programming system. (That has its own troubles in that the machinery provided by the other programming system for the task may actually be much bigger that what you are coding, and probably better at it). – Ira Baxter Jul 16 '14 at 10:11
0

The only ANTLR grammar I ever saw for C++ was abandoned by its author as being incomplete, and he was only trying for C++98 (YMMV). C++11 (and yea, verily, C++14) is here and much more complex. Building a production C++ is really hard, and unless you can get one that has been tested by fire, it probably doesn't work on real code.

I suggest you use Clang, the EDG C++ front end, or our DMS Software Reengineering Toolkit, all of which have robust C++ parsers. If you want to manipulate the parsed C++ for some purpose, you will want more machinery than a "mere" parser.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341