24

I managed to compile successfully clang for windows with cmake and visual studio 10. I would like to get an XML file as AST representation of the source code. There is one option that provides the result with clang with gcc under linux (ubuntu) but doesn't work on the windows box:

clang -cc1 -ast-print-xml source.c

However, this is invoking the compilation stage (which I would like to avoid). Digging in the source code didn't help me so far as I am quite new to clang. I could manage to generate the binary version of the AST by using:

clang -emit-ast source.c

Unfortunately, this format is unusable directly for parsing. Is there some existing method to directly generate the XML tree instead of a binary one in clang?

The goal is to use the XML representation in other tools in the .NET environment so I would need to make some wrapping around the native clang lib to access the binary AST. Maybe there is a third option if someone already wrote some binary clang AST parser for .NET?

Is it possible that I am missing something like if the AST generated by the clang front end is not equivalent to the one generated in the compilation stage.

durron597
  • 31,968
  • 17
  • 99
  • 158
jdehaan
  • 19,700
  • 6
  • 57
  • 97
  • 2
    My company builds C++ front ends, and we *can* emit complete XML dumps of the ASTs. We have this as a check-box item, because people ask for it. Nobody really uses it, because the amount of output for a real C++ program (which includes all the header files) is simply *enormous*, which makes it slow and clumsy to deal with. The real question is, why do you want to do this? Clang likely already offers a vast amount of machinery to process the C++ AST directly (as does our corresponding tool); why would you want to try to replicate all of that work? Why not just use Clang for your purpose? – Ira Baxter Mar 19 '11 at 17:42
  • ... see a C++ tree dump at http://stackoverflow.com/a/17393852/120163 This isn't XML, but the tool can produce XML also with the exact same content. – Ira Baxter Apr 12 '16 at 10:02

3 Answers3

21

For your information, the XML printer has been removed from the 2.9 version by Douglas Gregor (responsible of CLang FrontEnd).

The issue was that the XML printer was lacking. A number of the AST nodes had never been implemented in the printer, as well as a number of the properties of some nodes, which led to an inaccurate representation of the source code.

Another point raised by Douglas was that the output should be suitable not for debugging CLang itself (which is what the -emit-ast is about) but for consumption by external tools. This requires the output to be stable from one version to another. Notably it should not be a 1-on-1 mapping of CLang internal, but rather translate the source code into standarized language.

Unless there is significant work on the printer (which requires volunteers) it will not be integrated back...

Alba Mendez
  • 4,432
  • 1
  • 39
  • 55
Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • 3
    The funny part is that `-emit-ast` pretty-prints types instead of representing their structure, and for this reason is absolutely useless. It was only possible with an xml printer to debug and automatically verify the types in declarations. – SK-logic Mar 18 '11 at 13:27
  • @SK-logic: since xml is no longer an option, we might see an improvement of the `-emit-ast` behavior. – Matthieu M. Mar 18 '11 at 13:29
  • Thanks for all this interesting information. I will have a look at the old xml printer and try to see if I can make something useful with it for my own usage. Having some universal/standardized way of representing source code would be really a good thing, but a common denominator implies throwing away features and keeping specific things for all kinds of languages makes it too complex... Some extensible approach would be nice... For now thanks a lot for this answer. – jdehaan Mar 18 '11 at 20:39
  • 1
    It seems the current version (3.2) has it available on debug mode, I was able to extract xml from it. The 2.9 however does seems unable to do so for me thou. – Oeufcoque Penteano Jun 10 '12 at 04:25
  • 1
    @OeufcoquePenteano: How? Link? – Janus Troelsen May 12 '13 at 18:28
3

I've been working on my own version of extracting XML from Clang's AST. My code uses the Python bindings of libclang in order to traverse the AST.

My code is found at https://github.com/BentleyJOakes/PCX

Edit: I should add that it is quite incomplete in terms of producing the right source code tokens for each AST node. This unfortunately needs to be coded for each AST node type. However, the code should give a basis for anyone who wants to pursue this further.

1

Using a custom ASTDumper would do the job, without ofc compiling any source file. (stop clang in the frontend part). but you have to deal with all C and C++ code sources of llvm to accomplish that .

issamux
  • 1,336
  • 1
  • 19
  • 35
  • Internally clang implemented a JSONNodeDumper and a TextNodeDumper. I think it would be more convenient to get xml-format based on JSONNodeDumper by calling a converter library. – Layne Liu Apr 20 '23 at 12:13