11

Hi I am new in Compiler development, and am wondering how AST look like. I have a small section of code, and I use Clang for generating the AST. I don't get much information out of it. From the looks of it, the Syntax tree is exactly the same as the source, except for one struct that is added to almost any sample I test with.

Source:

class A {
public:
  int *a, *b, *c;
  int i;
  void sum() {
    a = new int[5];
    b = new int[5];
    c = new int[5];
    for (i = 0; i < 5; i++) {
      a[i] = i;
      b[i] = i;
    }
    for (i = 0; i < 5; i++) {
      c[i] = a[i] + b[i];
    }
    delete[] a;   delete[] b;   delete[] c;
  }
};

class B : public A {
};

int main() {
  B bclass; 
  bclass.sum();
  return 0;
} 

Command to generate AST:

clang++ -cc1 -ast-print ~/sum.cpp

AST output:

struct __va_list_tag {
    unsigned int gp_offset;
    unsigned int fp_offset;
    void *overflow_arg_area;
    void *reg_save_area;
};
typedef struct __va_list_tag __va_list_tag;
class A {
public:
    int *a;
    int *b;
    int *c;
    int i;
    void sum()     {
        this->a = new int [5];
        this->b = new int [5];
        this->c = new int [5];
        for (this->i = 0; this->i < 5; this->i++) {
            this->a[this->i] = this->i;
            this->b[this->i] = this->i;
        }
        for (this->i = 0; this->i < 5; this->i++) {
            this->c[this->i] = this->a[this->i] + this->b[this->i];
        }
        delete [] this->a;
        delete [] this->b;
        delete [] this->c;
    }


};
class B : public A {
};
int main() {
    B bclass;
    bclass.sum();
    return 0;
}

Thanks

Sriram Murali
  • 5,804
  • 2
  • 26
  • 32
  • 6
    Just a note: you might want to try -ast-dump instead of -ast-print; that representation might be closer to what you're looking for. – servn Oct 30 '11 at 02:30
  • 2
    If the question is what does an AST look like, and not what Clang's AST looks like, you might find this answer useful: http://stackoverflow.com/questions/6376662/how-a-ast-for-an-object-oriented-programming-language-would-look-like/6378997#6378997 – Ira Baxter Oct 28 '11 at 21:51

3 Answers3

17

There is a small confusion between the various options available:

  • -ast-print will pretty-print the current AST, that is, it will render the code it understood as closely as possible to what it parsed (but making some things explicit, like the apparition of the this)
  • -ast-dump will generate a lisp-like representation of the current AST

The pretty printer can be useful to check that the AST is lossless (ie, preserved the const-ness of such expression, etc...) but is not really about development.

If you want to hack on the compiler, you need -ast-dump, which will generate an output that maps directly the in-memory representation of the code that was parsed.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
7

The AST is a linked structure in memory ("tree" does not make justice to the complexity of the thing, but it's the name people use). What -ast-print produces is a textual representation of the AST. Since the human who set the option is already familiar with C/C++-like syntax, it is printed in a representation that follows that syntax. This is a design choice, not a happy coincidence.

If you want to see what the AST looks like when it's not printed on purpose in a familiar syntax, you could for instance look at GIMPLE, GCC's internal representation.

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
  • Thanks Pascal. The reason why I tried to print the AST is to understand what clang is doing. I thought it would be a starting point for stepping through the creation of AST for better understanding, and finally to add newer types/ functions within Clang. I think I will have to find an alternative solution for that. – Sriram Murali Oct 28 '11 at 21:49
  • 3
    Warning: GIMPLE is difficult to understand and cumbersome to manipulate. – Alexandre C. Oct 28 '11 at 22:06
3

And if you want to play with GIMPLE, you could even use GCC MELT for that purpose. MELT is a high-level domain specific language to deal with GIMPLE!

And inside compilers, the internal representation are often not trees, but somehow circular structures. In GCC, a basic block knows it gimple-s, but the gimple-s may know their basic blocks.... (it is a bit more complex, but you've got the idea).

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547