4

I am new to compiler business and using ANTLR grammar (open source) to parse C source files that have many external header files i.e. include files and library files etc.

What is the way the define grammar for these header files? Is there some way to parse these include files as simple source files?

Is it possible to integrate all these source +include files into a package and parse it using ANTLR or other C parser (JavaCC).

Waiting for your kind suggestions.

user628127
  • 117
  • 1
  • 1
  • 5
  • The standard approach would be to preprocess the #include statements and then use your c grammar against the resulting file. – Jimmy Feb 22 '11 at 12:41
  • Do you want to build your own parser, or are you interested in getting at the information in the C headers? – Ira Baxter Feb 23 '11 at 03:30
  • How to preprocess the #include files. My domain is Java and windows and I don't want to use gcc preprocessor. – user628127 Feb 23 '11 at 03:45
  • @Ira: I want to build C parser with the ANTLR/JavaCC C grammar but found it not enough for parsing include files. – user628127 Feb 23 '11 at 03:47
  • 2
    What "it" are you discussing? And why isn't "it" good enough? Presumably the header files contain valid C syntax. Why can't Antlr parse the header file, if given it as input? [You may have a problem that Antlr already has a stream open to the main file, and you have to open another (sub)parser on the include file; that's an organizational issue in your parser, and not a grammar problem; you need a stack of input streams and the machinery to switch when you encounter a new #include or an EOF). – Ira Baxter Feb 23 '11 at 03:51
  • it refers to ANTLR. Is it possible to pass #include files as simple C source files to ANTLR? What I want to achieve is to extract base metrics from AST generated by ANTLR for C source. I am using ANTLR with eclipse IDE. – user628127 Feb 23 '11 at 04:15
  • @user628127, _"Is it possible to pass #include files as simple C source files to ANTLR?"_, the answer is: "yes". Of course, the path to these external files must be known. While parsing one C file, just create an instance of a new (ANTLR generated) parser to parse each `#include`d file (and so on). But this is what Ira Baxter already suggested. – Bart Kiers Feb 23 '11 at 09:08
  • @user628127, you *do not want* or you *can not*? Any way, you can use `cl.exe` preprocessor too. If you've already got C headers in your system, there must be a C compiler somewhere too. And I'd warn you against parsing #include files one by one - chances are you'd miss all the proper #ifdef selections and end up with a number of conflicting redeclarations. It is not a good idea at all. – SK-logic Feb 23 '11 at 10:46
  • @IraBaxter Hi Ira I have a similar issue but all I want to do is "getting at the information in the C headers" (your first comment above suggests you know how). Do you know if ANTLR already has grammar for this sort of 'lexical' analysis? Literally all i want is to input my C header file and output the information about each method contained within that file (return type, parameters, etc). Please help! (I tried asking this question here to no avail: http://stackoverflow.com/questions/24707331/parsing-c-header-files-in-c-sharp ) Thanks! – stackPusher Jul 15 '14 at 18:46
  • @stackPopper: see my response to your question. – Ira Baxter Jul 16 '14 at 09:32

1 Answers1

1

It won't be easy to implement a full chain of a preprocessor and a parser for C. But you can reuse the existing preprocessor (e.g., gcc -E) and an existing parser (clang -Xclang -ast-print-xml or gcc-xml are both good choices) and then parse a simple XML output instead.

SK-logic
  • 9,605
  • 1
  • 23
  • 35
  • According to this post https://stackoverflow.com/a/5352066/646942 the `-ast-print-xml` feature is no longer available in `clang`. – Matze Jul 27 '20 at 13:08
  • 1
    @Matze Yes, the modern alternative is to use Python CIndex bindings instead. See an example here: https://github.com/combinatorylogic/clike/blob/809f7e5998a438f7e2bf5fee86472c45d019fc35/llvm-wrapper/rebuild.py – SK-logic Jul 27 '20 at 18:12
  • Interesting; I will check that out. I also found CppAst.NET, which also uses `clang` and looks quite promising: https://github.com/xoofx/CppAst.NET – Matze Jul 27 '20 at 21:55