1

I'm looking for an C source code parser that can create a comprehensive AST from it.

Preferably a Java library (I'd rather not have to use Python here -> http://code.google.com/p/pycparser/)

3 Answers3

3

The Eclipse CDT project has a C parser written in Java, see my answer to a similar question.

Community
  • 1
  • 1
Philipp Wendler
  • 11,184
  • 7
  • 52
  • 87
1

It may well be easiest for you to use ANTLR and get it to generate an AST based on an existing ANTLR grammar, e.g. the source.

ANTLR has a Java API here: http://www.antlr.org/api/Java/index.html

michaelbn
  • 7,393
  • 3
  • 33
  • 46
snim2
  • 4,004
  • 27
  • 44
  • 5
    You might be able to build a C parser; certainly ANTLR builds fine parsers in the narrow sense. But what OP wants appears to be something that reads real C programs. The detail work to get this right is much bigger than you think: preprocessor, dialects (C89, C99, C11, ...), vendor varients (GCC, MSVC, GreenHills), character sets. You are better off getting one that already does all this rather than trying to re-invent it yourself. – Ira Baxter Jul 03 '15 at 12:49
-2

Our C Front End is not in Java, definitely not in Python :-} but provides robust parsers for many real dialects of C code. It goes beyond building just ASTs; it provides a preprocessor, symbol tables, local and global flow analyses, which you will need if you want to do anything to C other than just "having an AST".

It comes built on top of our DMS Software Reengineering Toolkit, which provides the infrastructure for parsing and flow analysis, can apply transforms to the AST using patterns, and can regenerate valid source code.

EDIT July 2015 (in response to comment): DMS itself is implemented in a parallel programming language, PARLANSE, which is C-like in capability but includes fine-grain parallelism constructs as well as exception handling. DMS provides a set of DSLs for defining language processing: a fully-Unicode capable lexer, BNF for grammars, attribute grammars for computing tree-shaped analyses, and source-to-source transformations useful for pattern recognition and rewriting the source code.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • 1
    This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. – Tony Hopkinson Jul 04 '15 at 13:10
  • @TonyHopkinson: What part of "*looking for a C source code parser*" does this not answer directly? Where's the critique in this answer? – Ira Baxter Jul 04 '15 at 13:37
  • 2
    Looked over the top to me mate. Guy also asked for java, and you didn't say what it was built in. Built in our .. also made it a tad iffy in my eyes. If it was me I'd have done this as a you might want to look at comment. PS I didn't do the downvote... – Tony Hopkinson Jul 06 '15 at 09:45
  • Good work, there's even AS3 –  Jul 21 '17 at 22:09