7

I have a C-Header file defining a couple of stucts, containing multiple char arrays.

I'd like to parse these files using Java. Is there a library for reading C-Header files either into a structure or is there a stream parser that understands C-Header files?

Just for more background (I'm just looking for a C-Header parser, not a solution for this particular problem): I have a text file containing data and a C-Header file explaining the structure. Both are a bit dynamic, so I don't want to generate Java class files.

example:

#define TYPE1
typedef struct type1
{
char name1[10];
char name2[5];
}
#endif

Type2, Type3 etc are similar.

Data structure:

type1ffffffffffaaaaa
PhilW
  • 741
  • 6
  • 23
  • the man-pages for lex and yacc (flex and bison) may help – Peter Miehle Apr 24 '12 at 14:49
  • Oh good point, I forgot about yacc. I was hoping there would already be an existing solution for this. But I guess I could create my own streaming parser using yacc. – PhilW Apr 24 '12 at 14:59
  • @PhilW, have you created your parser? I need to parse C Header containing preprocessor definitions (just integer constants). I need to get these constants from C Header to Java application. Of course I'm looking for some kind of ready-made solution before implementing my own one =) – Dmitry Frank Aug 12 '13 at 13:17
  • Hey Dmitry, unfortunately not - I didn't get around to it and got caught up with other things... But if you find the time, let me know ;-) (I struggled with the CDT thing below for a while, but didn't get it to work, I think I would rather do my own parser) – PhilW Aug 13 '13 at 05:29

4 Answers4

18

You can use an existing C parser for Java. It does a lot more than parsing header files, of course, but that shouldn't hurt you.

We use the parser from the Eclipse CDT project. This is an Eclipse plugin, but we sucessfully use it outside of Eclipse, we just have to bundle 3 JAR files of Eclipse with the parser JAR.

To use the CDT parser, start with an implementation of org.eclipse.cdt.core.model.ILanguage, for example org.eclipse.cdt.core.dom.ast.gnu.c.GCCLanguage. You can call getTranslationUnit on it, passing the code and some helper stuff. A code file is represented by a org.eclipse.cdt.core.parser.FileContent instance (at least in CDT7, this seems to change a lot). The easiest way to create such an object is FileContent.createForExternalFileLocation(filename) or FileContent.create(filename, content). This way you don't need to care about the Eclipse IFile stuff, which seems to work only within projects and workspaces.

The IASTTranslationUnit you get back represents the whole AST of the file. All the nodes therein are instances of IASTSomething types, for example IASTDeclaration etc. You can implement your own subclass of org.eclipse.cdt.core.dom.ast.ASTVisitor to iterate through the AST using the visitor pattern. If you need further help, just ask.

The JAR files we use are org.eclipse.cdt.core.jar, org.eclipse.core.resources.jar, org.eclipse.equinox.common.jar, and org.eclipse.osgi.jar.

Edit: I had found a paper which contains source code snippets for this: "Using the Eclipse C/C++ Development Tooling as a Robust, Fully Functional, Actively Maintained, Open Source C++ Parser", but it is no longer available online (only as a shortened version).

Philipp Wendler
  • 11,184
  • 7
  • 52
  • 87
  • 1
    Sounds like a good fit! Would you mind throwing in a few keywords or pointers on how to do this? (what are the main classes in this scenario?) – PhilW Apr 24 '12 at 15:18
  • In fact I do have a question: In my sample above, I have a macro "#define type1" - how do I make my ASTVisitor visit that? – PhilW Apr 25 '12 at 12:29
  • I'm not sure how it supports pre-processor statements, as our code has none of them. But there are some methods in `IASTTranslationUnit` which seem to give you access to pre-processor statements like `#define`. – Philipp Wendler Apr 25 '12 at 14:51
  • Thanks - I'm actually still struggeling to create a IFile object.. Looks a bit difficult when you're not using it inside a plugin environment. Also, I found that the C-Model, in here (Using CDT APIs to programmatically introspect C/C++ code by Markus Schorn) http://wiki.eclipse.org/images/c/c7/CDT_APIs_for_code_introspection.pdf might be what I'm looking for. – PhilW Apr 26 '12 at 12:16
  • I added how we do this. Thanks for the link, I did not know that slides. Every bit of documentation about CDT is welcome, as docs are quite rare (at least we have the source). – Philipp Wendler Apr 26 '12 at 12:42
  • Yeah, that's what I found - the docs are also there, but not much how the things work together. Thanks for your update. I also found this http://dev.eclipse.org/mhonarc/lists/cdt-dev/msg17031.html regarding the preprocessor statements. I can get them, but not in a way they would be of any use to me, as I need to know what they encapsule. I might add more comments here, along the way. Thank you for your help so far! – PhilW Apr 26 '12 at 13:02
  • You talk about FileContent, which is fine, and you talk about IASTTranslationUnit, but I don't see how the two are related. Once I have a FileContent object, what do I do with it to turn it into the AST? The IASTTranslationUnit and its parent does not seem to have any functions that interact with FileContent. – searchengine27 Oct 31 '14 at 20:15
  • @searchengine27 As I wrote, you need the `ILanguage` instance (e.g., `GCCLanguage`). This has the method `getTranslationUnit` which converts `FileContent`s to `IASTTranslationUnit`s. – Philipp Wendler Oct 31 '14 at 20:28
  • I see, I think I misread. Lots of classes mentioned above...CDT is a bit intimidating. – searchengine27 Oct 31 '14 at 21:51
  • paper link is 303 Forbidden – Jason S Jun 08 '15 at 21:47
  • @JasonS Indeed, I removed it as I cannot find it anywhere else. – Philipp Wendler Jun 09 '15 at 05:42
  • 1
    For finding research papers, the most reliable source is scholar.google.com. For this paper, https://www.researchgate.net/profile/Alberto_Sillitti/publication/266483397_Using_the_Eclipse_CC_Development_Tooling_as_a_Robust_Fully_Functional_Actively_Maintained_Open_Source_C_Parser/links/550693800cf231de0777fec0.pdf is the link. – Taha Rehman Siddiqui Jun 01 '16 at 00:06
  • Does is support MSVC developed code? – Veno Jan 24 '23 at 13:44
7

Example using Eclipse CDT with only 2 jars.
- https://github.com/ricardojlrufino/eclipse-cdt-standalone-astparser
In the example has a class that displays the structure of the source file as a tree and another example making interactions on the api ...

A detail is that with this api(Eclipse CDT Parser) you can do the parsing from a string in memory.

Another example of usage is:
https://github.com/ricardojlrufino/cplus-libparser
Library for metadata extraction (information about classes, methods, variables) of source code in C / C ++.

See file: https://github.com/ricardojlrufino/cplus-libparser/blob/master/src/main/java/br/com/criativasoft/cpluslibparser/SourceParser.java

4

As mentioned already, CDT is perfect for this task. But unlike described above I used it from within a plugin and was able to use IFiles. Then everything is so mouch easier. To get the "ITranslationUnit" just do:

ITranslationUnit tu = (ITranslationUnit) CoreModel.getDefault().create(myIFile);
IASTTranslationUnit ias = tu.getAST();

I was i.e. looking for a special #define, so I could just:

ppc = ias.getAllPreprocessorStatements();

To get all the preprocessed code statements, each statement in array-element. Perfectly easy.

Matthias
  • 1,200
  • 2
  • 13
  • 33
2

You can try to use ANTLR. There should be already some existing C grammar available for it.

Matej
  • 6,004
  • 2
  • 28
  • 27