13

I'm looking to get an AST for C++ that I can then parse with an external program. What programs are out there that are good for generating an AST for C++? I don't care what language it is implemented in or the output format (so long as it is readily parseable).

My overall goal is to transform a C++ unit test bed to its corresponding C# wrapper test bed.

Thomas Eding
  • 35,312
  • 13
  • 75
  • 106
  • 1
    "Closed as not constructive?" The OP has a very clear request, and frankly there aren't very many answers so there cannot be much debate. The answers provided to far are supported by specific facts. – Ira Baxter Apr 29 '13 at 18:07
  • check this out: http://eli.thegreenplace.net/2011/07/03/parsing-c-in-python-with-clang/ – Janus Troelsen May 12 '13 at 18:14
  • and [any C/C++ refactoring tool based on libclang? (even simplest “toy example” )](http://stackoverflow.com/q/7969109/309483) – Janus Troelsen May 12 '13 at 18:15

3 Answers3

12

You can use clang and especially libclang to parse C++ code. It's a very high quality, hand written library for lexing, parsing and compiling C++ code but it can also generate an AST.

Clang also supports C, Objective-C and Objective-C++. Clang itself is written in C++.

Phong
  • 6,600
  • 4
  • 32
  • 61
  • Any reason for the "spoiler"-type blockquote? – Bart Jan 26 '12 at 20:40
  • I can't figure out how to get the AST from clang. Are there any tutorials on how to do this? – Thomas Eding Jan 26 '12 at 22:58
  • 1
    I think that [these videos and slides](http://llvm.org/devmtg/2010-11/) are a very good start. I learnt how to implement syntax highlighting using them but it's basically the same thing, you can use it to walk the AST. –  Jan 27 '12 at 16:06
  • 2
    For the benefit of others: libclang (the C binding) is terrible and incomplete. clang (the C++ binding) is wonderful. – Thomas Eding Nov 20 '14 at 18:56
8

Actually, GCC will emit the AST at any stage in the pipeline that interests you, including the GENERIC and GIMPLE forms. Check out the (plethora of) command-line switches begining with -fdump- — e.g. -fdump-tree-original-raw

This is one of the easier (…) ways to work, as you can use it on arbitrary code; just pass the appropriate CFLAGS or CXXFLAGS into most Makefiles:

    make CXXFLAGS=-fdump-tree-original-raw all

… and you get “the works.”

Updated: Saw this neat little graphing system based on GCC AST's while checking my flag name :-) Google FTW.

http://digitocero.com/en/blog/exporting-and-visualizing-gccs-abstract-syntax-tree-ast

BRPocock
  • 13,638
  • 3
  • 31
  • 50
2

Our C++ Front End, built on top of our DMS Software Reengineering Toolkit can parse a variety of C++ dialects (including C++11 and ObjectiveC) and export that AST as an XML document with a command line switch. See example ASTs produced by this front end.

As a practical matter, you will need more than the AST; you can't really do much with C++ (or any other modern language) without an understanding of the meaning and scope of each identifier. For C++, meaning/scope are particularly ugly. The DMS C++ front end handles all of that; it can build full symbol tables associating identifers to explicit C++ types. That information isn't dumpable in XML with a command line switch, but it is "technically easy" to code logic in DMS to walk the symbol table and spit out XML. (there is an option to dump this information, just not in XML format).

I caution you against the idea of manipulating (or even just analyzing) the XML. First, XSLT isn't a particularly good way to understand the meaning of the ASTs, let alone transform the AST, because the ASTs represent context sensitive language structures (that's why you want [nee MUST HAVE] the symbol table). You can read the XML into a dom-like tree if you like and write your own procedural code to manipulate it. But source-to-source transformations are an easier way; you can write your transformations using C++ notation rather than buckets of code goo climbing over a tree data structure.

You'll have another problem: how to generate valid C++ code from the transformed XML. If you don't mind spitting out raw text, you can solve this problem in purely ad hoc ways, at the price of having no gaurantee other than sweat that generated code is syntactically valid. If you want to generate a C++ representation of your final result as an AST, and regenerate valid text from that, you'll need a prettyprinter, which are not technically hard but still a lot of work to build especially for a language as big as C++.

Finally, the reason that tools like DMS exist is to provide the vast amount of infrastructure it takes to process/manipulate complex structure such as C++ ASTs. (parse, analyse, transform, prettyprint). You can try to replicate all this machinery yourself, but this is usually a poor time/cost/productivity tradeoff. The claim is it is best to stay within the tool ecosystem rather than escape it and build bad versions of it yourself. If you haven't done this before, you'll find this out painfully.

FWIW, DMS has been used to carry out massive analysis and transformations on C++ source code. See Publications on DMS and check the papers by Akers on "Re-engineering C++ Component Models".

Clang is based on the same kind of philosophy; there's an ecosystem of tools.

YMMV, but I'd be surprised.

Community
  • 1
  • 1
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341