5

Is there any C grammar available which generates the AST, which includes all the parser rules using "^" and "!" notations?

I went through the book written by Terence Parr, to write such a grammar, but it seems that writing one such grammar for C lang is a time consuming process, so was wondering if its available already which can me save a lot of time!

(A grammar for a smaller subset of C language is also fine..)

Thanks :)

Hari Krishna
  • 551
  • 6
  • 21
  • @bart , i think u have mistaken, this is not Vinod, sorry :) – Hari Krishna Apr 14 '11 at 19:07
  • @bart, haha, I don kno y u got that doubt, anyways saw the other profile u mentioned, atleast i hav included more info abt me, which i cannot do just to hav a diff identity in this site :) – Hari Krishna Apr 14 '11 at 19:13
  • Okay, I'll remove my comments in that case. Good luck. – Bart Kiers Apr 14 '11 at 19:14
  • 1
    Did you check the ANTLR.org site? I'll swear I've seen a C grammar. Doubt if it handles preprocessor directives. I know I've seen a C++ grammar but it wasn't really quite right. – Ira Baxter Apr 14 '11 at 20:21
  • 4
    Interestingly enough, Terrence Parr wrote an Antrl C grammar. Is http://www.antlr.org/grammar/1153358328744/C.g what you're looking for? – Rafe Kettler Apr 15 '11 at 19:08
  • @Rafe Kettler, no, that grammar does not produce an AST, it just creates a "flat" parse tree. – Bart Kiers Apr 15 '11 at 20:42
  • @Rafe , Again Bart is correct, :) im looking for a C grammar which can produce an AST in a proper format ! – Hari Krishna Apr 16 '11 at 13:49

2 Answers2

2

See this. It's straight from the ANTLR 4 source repo: a C11 grammar. It looks pretty compliant.

Of course, it doesn't come with a preprocessor, but handing cpp or mcpp the file first is easy enough.

It also doesn't come with AST rules, but it doesn't look too hard to do (albeit time consuming).

kirbyfan64sos
  • 10,377
  • 6
  • 54
  • 75
  • Hmm. The grammar appears to accept X*Y as both as an expressionstmt and as a declaration; there's nothing that obviously discriminates the two cases. I don't think ANTLR4 handles (captures) ambiguity, so this can't be right. This is an old problem with parsing C and C++: see http://stackoverflow.com/questions/243383/why-cant-c-be-parsed-with-a-lr1-parser/1004737#1004737 It might be possible to repair this grammar by adding the usual hack seen in LALR parsers for C. – Ira Baxter Jan 28 '15 at 03:40
  • @IraBaxter isn't it supposed to take the branch coming first? – vines Nov 22 '15 at 00:07
  • @vines: Assuming ANTLR "takes the first branch", no matter which way the grammar rules for X * Y are written, the actual semantic interpretation may be the opposite (read my referenced answer carefully). What this means is it can't possibly get the right interpretation all the time... thus it will sometimes parse the program incorrectly. The only way out is to accept *both* parses as valid, or to eliminate the "wrong" parse at the moment it is being proposed. I don't believe ANTLR can do the first, or that the second can be done without implementing the same awful hack of old C parsers. – Ira Baxter Nov 27 '15 at 08:45
1

No answers after two weeks.

You are right, building a full parser that builds complete ASTs and handles all the details of C (including preprocessor) covering a variety of dialects of C (e.g., ANSI, GNU C 2/3/4/, Miscrosoft Visual C, Green Hills C)... is actually a lot of work. And unless you invest this work, it won't process any real C programs.

I would expect there to be a full ANTLR grammar for C that did this considering how old ANTLR is. It is surprising that nobody here can seem to identify one; certainly you'd expect to find it at the ANTLR site.

We've put the energy required into building such C parsers (covering all the above dialects), and added computing symbol tables, extracting control and data flows, building call graphs, enabling analyzers, and tree transformations in the DMS Software Reengineering Toolkit with its C front end. This front end has been applied to C applications comprised of 18,000 compilation units to build custom analysis tools.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341