4

For my finale year project I'm learning about compiler techniques, and currently I'm trying to experiment with the GCC intermediate representation (raw GIMPLE) and getting the control flow graphs from different source files (C, Cpp and Java) using GCC-5.4.

So far i can generate *.004t.gimple and *.011t.cfg raw files using -fdump-tree-all-graph-raw but later I'm looking to understand more the GIMPLE language so i searched for its grammar and i have found this :

So the language seems to be constantly changing and have multiple formats (High level GIMPLE, Low_level_GIMPLE, SSA GIMPLE, tree) and also the grammar seems to keep changing between versions but i can't find the GIMPLE grammar for the recent versions and specifically the one used in GCC-5.4 and i can't understand the different formats.

Questions about the grammar :

  • where can i find the GIMPLE grammar used in GCC-5.4 and more recent versions?
  • how is it written ? (in BNF or EBNF or ...)
  • How does GCC implement this grammar to generate, parse and understand Gimple files it generates and later transform them to RTL?
  • is it possible for me to write a small subset of the GIMPLE grammar in Xtext from examples of *.004t.gimple files that i generate?

Questions about the formats:

  • What's the difference between the 3 Gimple formats? (i can't seem to find detailed documentation about each one in the wiki)
  • which format is used in the raw files *.c.004t.gimple and *.c.011t.cfg ? (High or Low, ...)
  • which one represents better the control flow from the original source code without optimizations ?

Thank You,

  • _where can i find the GIMPLE grammar..._ I don't think, any such definition exists. A very simple overview is given in Uday Khedeker's [slides](https://www.cse.iitb.ac.in/~uday/courses/cs715-09/gcc-gimple.pdf) on page 33. – Vladislav Ivanishin Jul 19 '19 at 12:03
  • _How does GCC implement this grammar to generate, parse and understand Gimple files it generates..._ GCC does not normally output gimple to files (you can require a dump with e.g. `-fdump-tree-all`, sure, but that is meant for compiler hackers and not needed for operation); gimple structures are genereted by a language frontend (and the _gimplifier_), and they are kept and processed _in memory_, rtl is later made from that. Recently a GIMPLE frontend was added (in 7.2, IIRC), but that is by no means complete (and again, its purpose is to make possible writing unit tests for the compiler). – Vladislav Ivanishin Jul 19 '19 at 12:08
  • Oh, also I've found "rough GIMPLE grammar" in the _GENERIC and GIMPLE: A New Tree Representation for Entire Functions_ paper by Jason Merrill – Vladislav Ivanishin Jul 19 '19 at 12:23
  • @VladislavIvanishin thanks for the slides, i know that there is a "rough GIMPLE grammar" from the 1st paper and in GCC-4.3.6 and GCC-4.2.1 as you can find in my links but i'm looking for a more recent one – user7724084 Jul 19 '19 at 15:22
  • or maybe they deleted the grammar from the docs because they don't use it anymore?!! so how does it work now ?!! how do they produce the dump files from the in memory structures and does the dumps represent the source files completely ? – user7724084 Jul 19 '19 at 15:33
  • _how do they produce the dump files from the in memory structures_ — well, they just walk the gimple trees. I think, the `dump_function_to_file()` function is a good entry point (you can read/debug from there). _and does the dumps represent the source files completely_ — I think, there's no guarantee, because like I said, these dumps are mainly used for debugging and testing. Also, to be pedantic, some information about the source files is always lost during compilation (but of course enough is preserved to correctly reflect the semantics of the input program). – Vladislav Ivanishin Jul 19 '19 at 15:48
  • @VladislavIvanishin it seems that GIMPLE used to have a grammar at the beginning but later it received so many changes and the grammar didn't keep up. and yes it does not represent the source code as it should (explained in these notes (http://nkavvadias.com/hercules/gimple-notes.html ). also by reading the source code it doesn't seem to be possible to make something that understands (parse) GIMPLE in its generated textual format since it doesn't seem to have a fixed structure (or grammar) but i still need to do it because it seems to contain enough information for what i want to do. – user7724084 Aug 07 '19 at 17:25

1 Answers1

1

It looks like you just starting to learn GIMPLE and did not even read documents you`re posted above. I am digging in depth of GCC for some time and I will try to answer your questions.

  1. Anyway you need to read gccint document lays here: https://gcc.gnu.org/onlinedocs/gccint.pdf it helps to answer some questions and gives some info about GIMPLE, and this is the only document where GIMPLE is described at least somehow. The best description in sources, it is sad but as is. Look also here, http://www.netgull.com/gcc/summit/2003/GENERIC%20and%20GIMPLE.pdf, this document based on gccint and consist of some extract from.

  2. There is no "GIMPLE grammar" described in a clear way, like C language, just look in sources, maybe some poor examples on the internet.

  3. I think it is generated from Tree-adjoining grammar(TAG), based on SIMPLE IL used by the McCAT compiler project at McGill University [SIMPLE].

  4. How GCC implement and understand? And again you need to look in depths of GCC, gimple.h, basic-block.h, tree-pass.h for example, all of these lays in $src/gcc/. Some part of the functions is described in gccint in section GIMPLE. The reference gccint is not exactly accurate, it consists of some outdated functions and references, you must remember that(FOR_EACH_BB for example, deprecated in 2013).

  5. About Xtext, I never used that, and I do not understand the need to write some GIMPLE yourself, which is intermediate language IL you can create a plugin for optimizing your code flow, but I can not see the need to use GIMPLE separately.

About format.

  1. There is one GIMPLE format, but it can have two forms AFAIK. GIMPLE HIGH it is just GIMPLE that is not fully lowered and consists of the IL before the pass pass_lower_cf. High GIMPLE contains some container statements like lexical scopes (represented by GIMPLE_BIND) and nested expressions (e.g., GIMPLE_TRY). Low GIMPLE exposes all of the implicit jumps for control and exception expressions directly in the IL and EH region trees(EH means Exception Handling). There is also RAW representation, it is some kind of polish notation as I understand, IMO it more useful than usual representation, you can get it with -fdump-tree-all-all-raw for example.

  2. *.c.004t.gimple - this is the first step of GIMPLE appear, *.c.011t.cfg - first attempt for control flow graph(cfg). The internal name of GIMPLE lower is "lower" you can see them in gimple-low.c in section

    const pass_data pass_data_lower_cf =
    {
      GIMPLE_PASS, /* type */
      "lower", /* name */
      OPTGROUP_NONE, /* optinfo_flags */
      TV_NONE, /* tv_id */
      PROP_gimple_any, /* properties_required */
      PROP_gimple_lcf, /* properties_provided */
      0, /* properties_destroyed */
      0, /* todo_flags_start */
      0, /* todo_flags_finish */
    };
    

    You can use search and find that this pass is *.c.007t.lower

  3. The answer is above I think, I am using RAW representation it is more informative IMO.

It not much, but I hope it helps you with your GCC exploration, and sorry for my bad "Engrish".

Alexy Khilaev
  • 415
  • 3
  • 13
  • So after going through the source files, i understand better how the GIMPLE files are generated but still this doesn't help me read (parse) these files since i trying to extract some info that exists in the files and maybe even translate them to another language like in this. (http://nkavvadias.com/hercules/#gimple2nac) – user7724084 Aug 08 '19 at 16:04
  • try to read about GIMPLE and CFG in gccint, and at the same time try to look into gimple passes generated by -fdump-tree-all-lineno-raw, this is my method, and at this time I just added a pass that handles the situation i need. Try to ask more concretely. – Alexy Khilaev Aug 14 '19 at 09:13