In Short
I need to get some kind of AST representation of GCC and Clang. Due to their complexity and size, I cannot find an easy way to achieve this.
The Details
For a project with the goal of comparing large programs with respect to their AST (similarly to how DECKARD does it), I require to get AST representations of GCC and Clang. (Also note that I do not necessarily require a single AST. I am completely content on receiving one AST per translation unit, and don't need a symbol table or headers.)
After some research I found a few possibilities on how to get the AST. However, all of those seem to have their own issues:
- Using the frontend of Clang with
clang -emit-ast foo.c
. - This seems to work well for small projects but managing all include paths for the GCC source code has proved to be difficult, resulting in many "undeclared type/identifier" errors. - Using the frontend of Clang with
clang -Xclang -ast-dump foo.c >> a.xml
. - Same issue as above but, some XML output is still produced, so the XML would have to be parsed. (Also: Is this output incomplete/erroneous?) - Writing a (F)LEX + YACC/BISON parser for C++ along the lines of FOG. - This sounds like a lot of effort and being prone to errors.
- Using the frontend of GCC:
gcc -fdump-tree-all-graph foo.c
. - The generated .dot file(s) would have to be parsed, so I would again have to write a (F)LEX + YACC/BISON parser. Also I suppose the same "undeclared symbols" issue as with option 1 might arise. - Using the DMS software suggested by this answer. - This software is proprietary.
My Questions
- Does anyone have a comparatively simple idea on how to progress?
- Are the XML files of option 2 erroneous or missing AST nodes?
- Is there a clang flag that suppresses the "undeclared symbols/identifier"-issues?
- Is there an easier way to find all required include paths than going through each file individually or trying to understand the 31k lines of the corresponding autogenerated GCC Makefile?
- Is the FOG parser of option 3 hard to adapt to output some kind of AST representation?
- Do other (relyable) sources for C++ LEX and YACC files exist somewhere? (I know a C version exists here.)
- Are there other options that I do not see to get AST representations of GCC and Clang?
Thanks a lot in advance.