4

I'd like to start a project that involves transforming C code, but I'd like to include the preprocessor directives. I don't want to reinvent the wheel by writing my own C parser, so does anyone know of a front-end that can parse C preprocessor and C code, and produce an AST that can be used to re-generate (or pretty-print) the original source?

e.g.,:

#define FILENAME "filename"
#include <stdio.h>

FILE *f=0;
...
if (file_is_open) {
#ifdef CAN_OPEN_IT
    f = fopen(FILENAME, "r");
#else
    printf("Unable to open file.\n");
#endif
}

The above code should be parsed into some in-memory representation that can be used to re-generate the source. In other words, it should not be processed as normal C in two phases, first processing the PP directives and then parsing pure C code. Rather it should represent the whole compile-time logic including the preprocessor variables.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Steve
  • 8,153
  • 9
  • 44
  • 91

4 Answers4

3

Take a look at Clang. (See http://clang.llvm.org/features.html#applications .)

Matthew Slattery
  • 45,290
  • 8
  • 103
  • 119
1

Take the GNU gcc compiler, the flags required to pre-process the source is gcc -E mysource.c, see here for further information. As for pretty printing it, there's indent and this explains the usage here, this is a bit old, but nonetheless worthy of mention. There is also cflow that can produce a map of the source.

Sorry if I misunderstood what you're looking for...

t0mm13b
  • 34,087
  • 8
  • 78
  • 110
  • Why the downvote? I mentioned indent and cflow...but the question is exactly not clear as to why the AST is needed when the context of the question included 'pretty print'. It would be nice for a downvote to leave a comment explaining why instead of ignoring it which is against the spirit of SO. – t0mm13b Jan 27 '10 at 00:01
  • Downvotes happen; they're a nuisance. Usually, they don't do irreparable damage to your reputation. – Jonathan Leffler Jan 27 '10 at 00:32
  • @Jonathan: Quick question, earlier I had 3 upvotes for http://stackoverflow.com/questions/2142796/in-linux-how-can-i-test-whether-the-output-of-a-program-is-going-to-a-live-termi/2142845#2142845 this, but is showing up as 5, instead of 30 why? – t0mm13b Jan 27 '10 at 00:35
  • 1
    Sorry if it wasn't clear, I'm looking for something that parses C and preprocessor code, not necessarily a pretty printer, but the reason I mentioned this is that a pretty printer probably parses the CPP code. What I want is something that will generate an AST that includes the CPP logic. I don't care about pretty printing per se. – Steve Jan 27 '10 at 01:25
  • @Steve: Ok, the best answer I can give is to look at Antlr's grammar for parsing here... http://www.antlr.org/grammar/list...using Antlr you can generate an AST and has multiple language interface, ie C#, C, CPP, Java can use the Antlr libraries for parsing, if that's what you are looking for... :) – t0mm13b Jan 27 '10 at 01:42
  • @tommieb75: regarding your '5 instead of 30'; I'd guess you reached your 200 limit for the day - after which you get a Mortarboard badge and no more points. – Jonathan Leffler Jan 27 '10 at 04:05
  • If "thanks" has been in your answer for 5 years, you're probably better off just leaving it there than editing it out. – S.S. Anne Oct 15 '19 at 22:38
1

Our DMS Software Reengineering Toolkit has a C front end (and a C++ front end) that:

  • parses (compilable) C source code in a variety of dialects into ASTs,
  • preserves the preprocessor directives in most cases as AST nodes
  • can regenerate compilable C code (with comments and preprocessor directives) from the ASTs
  • can collects thousands of files in a single image to allow cross-file analysis and transformation
  • provides full symbol table construction and access
  • provides procedural access to ASTs with a large AST manipulation library, including navigate, inspect, insert, delete, replace, match, ...
  • provides source-to-source transformations using patterns written in the C notation that match against the ASTs

For C (not yet for C++), DMS also provides:

  • control and data flow analysis
  • local and global points-to analysis
  • global call graph construction

DMS has been used to process extremely large C applications for the purposes of extracting facts and generating new, derived code from the original source base.

(EDIT: Feb 2016)

It can handle the OP's example (with slight fixes to make it valid). Here's the slightly revised source:

#define FILENAME "filename"
#include <stdio.h>

FILE *f;
main() {
  f=0;
if (file_is_open) {
#ifdef CAN_OPEN_IT
f = fopen(FILENAME, "r");
#else
printf("Unable to open file.\n");
#endif
}

}

Here is the AST produced:

C~GCC4 Domain Parser Version 3.0.1(28449)
Copyright (C) 1996-2013 Semantic Designs, Inc; All Rights Reserved; SD Confidential
Powered by DMS (R) Software Reengineering Toolkit
AST Optimizations: remove constant tokens, remove unary productions, compact sequences
Using encoding Unicode-UTF-8?ANSI +CRLF +1 /^I
(translation_unit@C~GCC4=2#4a7e0e0^0 Line 1 Column 1 File C:/temp/test.c
 (declaration_seq@C~GCC4=605#4a77580^1#4a7e0e0:1 {4} Line 1 Column 1 File C:/temp/test.c
  (control_line@C~GCC4=1094#4a775c0^1#4a77580:1 Line 1 Column 1 File C:/temp/test.c
   ('#'@C~GCC4=1548#4a771c0^1#4a775c0:1[Keyword:0] Line 1 Column 1 File C:/temp/test.c)'#'
   (IDENTIFIER@C~GCC4=1531#4a77200^1#4a775c0:2[`FILENAME'] Line 1 Column 9 File C:/temp/test.c)IDENTIFIER
   (<!MacroDefinition>@C~GCC4=1603#4a77180^2#4a775c0:3#4a7f300:1[`FILENAME'] Line 1 Column 18 File C:/temp/test.c
$VOID$ [Child 1]
   |(STRING_LITERAL@C~GCC4=1525#4a77160^2#4a77180:2#4a7f300:2[`filename'] Line 1 Column 18 File C:/temp/test.c)STRING_LITERAL
$VOID$ [Child 3]
   )<!MacroDefinition>#4a77180
   (new_line@C~GCC4=1578#4a77260^1#4a775c0:4[Keyword:0] Line 1 Column 28 File C:/temp/test.c)new_line
  )control_line#4a775c0
  (control_line@C~GCC4=1104#4a77460^1#4a77580:2 Line 2 Column 1 File C:/temp/test.c
   ('#'@C~GCC4=1548#4a77340^1#4a77460:1[Keyword:0] Line 2 Column 1 File C:/temp/test.c)'#'
   (ANGLED_HEADER_NAME@C~GCC4=1589#4a77380^1#4a77460:2[`stdio.h'] Line 2 Column 10 File C:/temp/test.c)ANGLED_HEADER_NAME
   (new_line@C~GCC4=1578#4a773c0^1#4a77460:3[Keyword:0] Line 2 Column 19 File C:/temp/test.c)new_line
  )control_line#4a77460
  (simple_declaration@C~GCC4=631#4a774c0^1#4a77580:3 Line 4 Column 1 File C:/temp/test.c
   (IDENTIFIER@C~GCC4=1531#4a77360^1#4a774c0:1[`FILE'] Line 4 Column 1 File C:/temp/test.c)IDENTIFIER
   (declarator@C~GCC4=850#4a77520^1#4a774c0:2 Line 4 Column 6 File C:/temp/test.c
   |(ptr_operator@C~GCC4=866#4a77560^1#4a77520:1 Line 4 Column 6 File C:/temp/test.c)ptr_operator
   |(IDENTIFIER@C~GCC4=1531#4a77480^1#4a77520:2[`f'] Line 4 Column 7 File C:/temp/test.c)IDENTIFIER
   )declarator#4a77520
  )simple_declaration#4a774c0
  (function_definition@C~GCC4=966#4a77be0^1#4a77580:4 Line 5 Column 1 File C:/temp/test.c
   (direct_declarator@C~GCC4=852#4a77440^1#4a77be0:1 Line 5 Column 1 File C:/temp/test.c
   |(IDENTIFIER@C~GCC4=1531#4a774e0^1#4a77440:1[`main'] Line 5 Column 1 File C:/temp/test.c)IDENTIFIER
   |(parameter_declaration_clause@C~GCC4=900#4a77220^1#4a77440:2 Line 5 Column 6 File C:/temp/test.c)parameter_declaration_clause
   )direct_declarator#4a77440
   (compound_statement@C~GCC4=507#4a77b20^1#4a77be0:2 Line 5 Column 8 File C:/temp/test.c
   |(statement_seq@C~GCC4=511#4a77d20^1#4a77b20:1 {2} Line 6 Column 3 File C:/temp/test.c
   | (AMBIGUITY<statement=358>@C~GCC4=1602#4a77680^1#4a77d20:1{2} Line 6 Column 3 File C:/temp/test.c
   |  (expression_statement@C~GCC4=503#4a7e040^1#4a77680:1 Line 6 Column 3 File C:/temp/test.c
   |   (assignment_expression@C~GCC4=457#4a77f00^1#4a7e040:1 Line 6 Column 3 File C:/temp/test.c
   |   |(assignment_target@C~GCC4=470#4a77a00^1#4a77f00:1 Line 6 Column 3 File C:/temp/test.c
   |   | (IDENTIFIER@C~GCC4=1531#4a77400^2#4a77a00:1#4a77fc0:1[`f'] Line 6 Column 3 File C:/temp/test.c)IDENTIFIER
   |   |)assignment_target#4a77a00
   |   |(INT_LITERAL@C~GCC4=1471#4a77a60^2#4a77f00:2#4a77f60:1[0] Line 6 Column 5 File C:/temp/test.c)INT_LITERAL
   |   )assignment_expression#4a77f00
   |  )expression_statement#4a7e040
   |  (simple_declaration@C~GCC4=630#4a7e060^1#4a77680:2 Line 6 Column 3 File C:/temp/test.c
   |   (init_declarator@C~GCC4=835#4a77fc0^1#4a7e060:1 Line 6 Column 3 File C:/temp/test.c
   |   |(IDENTIFIER@C~GCC4=1531#4a77400^2... [ALREADY PRINTED] ...)
   |   |(initializer@C~GCC4=983#4a77f60^1#4a77fc0:2 Line 6 Column 4 File C:/temp/test.c
   |   | (INT_LITERAL@C~GCC4=1471#4a77a60^2... [ALREADY PRINTED] ...)
   |   |)initializer#4a77f60
   |   )init_declarator#4a77fc0
   |  )simple_declaration#4a7e060
   | )AMBIGUITY#4a77680
   | (selection_statement@C~GCC4=527#4a77b40^1#4a77d20:2 Line 7 Column 1 File C:/temp/test.c
   |  (IDENTIFIER@C~GCC4=1531#4a7e0c0^1#4a77b40:1[`file_is_open'] Line 7 Column 5 File C:/temp/test.c)IDENTIFIER
   |  (compound_statement@C~GCC4=507#4a77ae0^1#4a77b40:2 Line 7 Column 19 File C:/temp/test.c
   |   (statement@C~GCC4=490#4a7f840^1#4a77ae0:1 Line 8 Column 1 File C:/temp/test.c
   |   |(if_directive@C~GCC4=1088#4a7f1c0^1#4a7f840:1 Line 8 Column 1 File C:/temp/test.c
   |   | ('#'@C~GCC4=1548#4a7f240^1#4a7f1c0:1[Keyword:0] Line 8 Column 1 File C:/temp/test.c)'#'
   |   | (IDENTIFIER@C~GCC4=1531#4a7ee60^1#4a7f1c0:2[`CAN_OPEN_IT'] Line 8 Column 8 File C:/temp/test.c)IDENTIFIER
   |   | (new_line@C~GCC4=1578#4a7f1e0^1#4a7f1c0:3[Keyword:0] Line 8 Column 19 File C:/temp/test.c)new_line
   |   |)if_directive#4a7f1c0
   |   |(AMBIGUITY<statement=358>@C~GCC4=1602#4a77d40^1#4a7f840:2{2} Line 9 Column 5 File C:/temp/test.c
   |   | (expression_statement@C~GCC4=503#4a7f4a0^1#4a77d40:1 Line 9 Column 5 File C:/temp/test.c
   |   |  (assignment_expression@C~GCC4=457#4a7f3c0^1#4a7f4a0:1 Line 9 Column 5 File C:/temp/test.c
   |   |   (assignment_target@C~GCC4=470#4a7eec0^1#4a7f3c0:1 Line 9 Column 5 File C:/temp/test.c
   |   |   |(IDENTIFIER@C~GCC4=1531#4a7eee0^2#4a7eec0:1#4a7f400:1[`f'] Line 9 Column 5 File C:/temp/test.c)IDENTIFIER
   |   |   )assignment_target#4a7eec0
   |   |   (postfix_expression@C~GCC4=201#4a7f2e0^1#4a7f3c0:2 Line 9 Column 9 File C:/temp/test.c
   |   |   |(IDENTIFIER@C~GCC4=1531#4a7f120^2#4a7f2e0:1#4a7f160:1[`fopen'] Line 9 Column 9 File C:/temp/test.c)IDENTIFIER
   |   |   |(expression_list@C~GCC4=228#4a7f260^2#4a7f2e0:2#4a7f160:2 Line 9 Column 15 File C:/temp/test.c
   |   |   | (<!MacroCall>@C~GCC4=1607#4a7f300^1#4a7f260:1[`FILENAME'] Line 9 Column 15 File C:/temp/test.c
   |   |   |  (<!MacroDefinition>@C~GCC4=1603#4a77180^2... [ALREADY PRINTED] ...)
   |   |   |  (STRING_LITERAL@C~GCC4=1525#4a77160^2... [ALREADY PRINTED] ...)
   |   |   |  $VOID$ [Child 3]
   |   |   |  (STRING_LITERAL@C~GCC4=1525#4a7f2c0^1#4a7f300:4[`filename'] Line 1 Column 18 File C:/temp/test.c)STRING_LITERAL
   |   |   |  $VOID$ [Child 5]
   |   |   | )<!MacroCall>#4a7f300
   |   |   | (STRING_LITERAL@C~GCC4=1525#4a7f140^1#4a7f260:2[`r'] Line 9 Column 25 File C:/temp/test.c)STRING_LITERAL
   |   |   |)expression_list#4a7f260
   |   |   )postfix_expression#4a7f2e0
   |   |  )assignment_expression#4a7f3c0
   |   | )expression_statement#4a7f4a0
   |   | (simple_declaration@C~GCC4=630#4a7f480^1#4a77d40:2 Line 9 Column 5 File C:/temp/test.c
   |   |  (init_declarator@C~GCC4=835#4a7f400^1#4a7f480:1 Line 9 Column 5 File C:/temp/test.c
   |   |   (IDENTIFIER@C~GCC4=1531#4a7eee0^2... [ALREADY PRINTED] ...)
   |   |   (initializer@C~GCC4=983#4a7f3e0^1#4a7f400:2 Line 9 Column 7 File C:/temp/test.c
   |   |   |(postfix_expression@C~GCC4=201#4a7f160^1#4a7f3e0:1 Line 9 Column 9 File C:/temp/test.c
   |   |   | (IDENTIFIER@C~GCC4=1531#4a7f120^2... [ALREADY PRINTED] ...)
   |   |   | (expression_list@C~GCC4=228#4a7f260^2... [ALREADY PRINTED] ...)
   |   |   |)postfix_expression#4a7f160
   |   |   )initializer#4a7f3e0
   |   |  )init_declarator#4a7f400
   |   | )simple_declaration#4a7f480
   |   |)AMBIGUITY#4a77d40
   |   |(else_directive@C~GCC4=1091#4a7f4c0^1#4a7f840:3 Line 10 Column 1 File C:/temp/test.c
   |   | ('#'@C~GCC4=1548#4a7f500^1#4a7f4c0:1[Keyword:0] Line 10 Column 1 File C:/temp/test.c)'#'
   |   | (new_line@C~GCC4=1578#4a7f4e0^1#4a7f4c0:2[Keyword:0] Line 10 Column 6 File C:/temp/test.c)new_line
   |   |)else_directive#4a7f4c0
   |   |(expression_statement@C~GCC4=503#4a7f7c0^1#4a7f840:4 Line 11 Column 5 File C:/temp/test.c
   |   | (postfix_expression@C~GCC4=201#4a77ba0^1#4a7f7c0:1 Line 11 Column 5 File C:/temp/test.c
   |   |  (IDENTIFIER@C~GCC4=1531#4a7f640^1#4a77ba0:1[`printf'] Line 11 Column 5 File C:/temp/test.c)IDENTIFIER
   |   |  (STRING_LITERAL@C~GCC4=1525#4a77c20^1#4a77ba0:2[`Unable to open file.
'] Line 11 Column 12 File C:/temp/test.c)STRING_LITERAL
   |   | )postfix_expression#4a77ba0
   |   |)expression_statement#4a7f7c0
   |   |(endif_directive@C~GCC4=1092#4a7f7e0^1#4a7f840:5 Line 12 Column 1 File C:/temp/test.c
   |   | ('#'@C~GCC4=1548#4a7f720^1#4a7f7e0:1[Keyword:0] Line 12 Column 1 File C:/temp/test.c)'#'
   |   | (new_line@C~GCC4=1578#4a7f700^1#4a7f7e0:2[Keyword:0] Line 12 Column 7 File C:/temp/test.c)new_line
   |   |)endif_directive#4a7f7e0
   |   )statement#4a7f840
   |  )compound_statement#4a77ae0
   | )selection_statement#4a77b40
   |)statement_seq#4a77d20
   )compound_statement#4a77b20
  )function_definition#4a77be0
 )declaration_seq#4a77580
)translation_unit#4a7e0e0

You can see the preprocessor directives as "if_directive" on line 8.

Yes, DMS can prettyprint this tree, too. The following command runs the parser to produce an AST, and then runs the DMS prettyprinter to regenerate source solely from the tree. The round-trip is accurate; you can recompile and get the same result. Comments are preserved, too.

C:\DMS\Domains\C\GCC4\Tools\PrettyPrinter>run domainprettyprinter \temp\test.c
C~GCC4 PrettyPrinter Version 1.2.13
Copyright (C) 2004-2013 Semantic Designs, Inc; All Rights Reserved; SD Confidential
Powered by DMS (R) Software Reengineering Toolkit

#define FILENAME "filename"
#include <stdio.h>
FILE *f;

main()
{
  f = 0;
  if (file_is_open)
    {
      #ifdef CAN_OPEN_IT
        f = fopen(FILENAME, "r");
      #else
        printf("Unable to open file.\n");
      #endif
    }
}

You can see how DMS handles C++. At this point it handles all of C++14 for GCC and MS dialects.

Community
  • 1
  • 1
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
-1

You can take look at the http://www.antlr.org/wiki/display/ANTLR3/ANTLR3+Code+Generation+-+C

kiranputtur
  • 338
  • 1
  • 10
  • 1
    This seems to be about (ANTLR) parser generators that produce parsers implemented in C. The OP wants something that *parses* C. Did I miss something? – Ira Baxter Feb 01 '10 at 04:48