0

I decided to start studying compiler theory but the problem is that I want a compiler for any language in order to track each of

  • lexical analyzer output.
  • syntax tree.
  • intermediate representation.
  • code generation.
  • I dont care for optimization right now

I am aware of some questions similar to mine about clang and gcc and I understand that both of them make lexical and syntax analysing on the fly I just want any compiler in any language as long as the compiler itself is written in C and run on ubuntu x64

u185619
  • 285
  • 3
  • 12
  • no I have no Idea what is lex and what is YACC but I heard about bison (it is parser code generator) – u185619 Mar 13 '15 at 08:09
  • Maybe some internals of LLVM would help you, there is a recent book: _Getting Started with LLVM Core Libraries_ – usr1234567 Mar 13 '15 at 08:21
  • I believe that clang is part of LLVM and clang allows only -E Run the preprocessor stage. -fsyntax-only Run the preprocessor, parser and type checking stages. -S Run the previous stages as well as LLVM generation and optimization stages and target-specific code generation, producing an assembly file. -c Run all of the above, plus the assembler, generating a target ".o" object file. there is no lexical analyser step shown or AST – u185619 Mar 13 '15 at 08:24
  • 2
    For practical reasons and in order to be able to focus also on a more modern languages and best practices I'd recommend to scratch the "_compiler itself is written in C_" requirement. As this way you eliminate wide range of interesting languages and their [bootstraping compilers](http://en.wikipedia.org/wiki/Bootstrapping_%28compilers%29) – xmojmr Mar 13 '15 at 08:34

3 Answers3

2

I am not sure you have the right approach, if you are willing to learn about compilation techniques for C specifically. And C is not the best language to write a compiler in (if you start from scratch, Ocaml is better suited for that task). BTW, recent Clang/LLVM or GCC are coded in C++ (no more in C).

The C language now sort-of requires optimization, as I explained here, so skipping the optimization part is not very useful. Be aware that optimization passes form the majority and the most difficult part of real-world compilers.

The lexing and parsing parts of compiler are now well understood. And there are several code generator tools for them (yacc or bison, lex or flex, ANTLR...). For several pragmatical reasons, real compilers like GCC don't use these tools.

You could look into tinycc, nwcc, or 8cc if you want to look inside non-optimizing toy C compilers.

You could also look into the intermediate representations of real compiler, e.g. GIMPLE for GCC (BTW, try to compile with gcc -fdump-tree-all -O2 -c some simple C code with a few loops; you'll be surprized by the hundreds of dump files showing the many internal compiler representations from many passes). You'll learn a lot by customizing GCC with MELT, and the MELT documentation page contains several very useful references. This answer should also help and contains or references some pictures of GCC.

disclaimer: I am the main author of MELT

PS. There are very good reasons to bootstrap compilers. So a compiler for a language other than C is unlikely to be coded in C (it is often coded for that language itself), since C is not a good programming language to write a compiler from scratch.

PPS. If you only know C -and no other programming languages-, I would suggest to learn some other programming language (e.g. Scheme with SICP, Ocaml, or Haskell or Scala or Clojure or Common Lisp) before diving into compilers! Read also something about Programming Language Pragmatics. If you know a bit of Scheme or Lisp, Queinnec's book Lisp In Small Pieces will teach you a big lot.

Community
  • 1
  • 1
Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
1

There are many, many places to start from to explore this territory. Many languages include a compilation capability or aspect such as Lisp and Forth.

To learn about a C compiler, there is a book about the LCC compiler which includes the source code for the compiler. There are also repositories of old C compilers at The Unix History Society archive (tuhs.org).

Still another angle you could take is to examine the language False (an ancestor of the more famous Brainfuck) which is designed to be implemented with very little code.

Another angle, which connects to your interest in complexity theory, is to learn about the Chomsky Hierarchy of languages and the associated abstract machines which can parse them. This will teach you why Lex and Yacc are separate tools and what each is good for (and how to do it yourself and not need them at all).

I am actually on the very same quest myself. I'm currently reading the old 1979 book Anatomy of Lisp which contains compiler code in, of course, Lisp. But this is ok, because I already have my own homebrewed lisp interpreter to execute it with.

luser droog
  • 18,988
  • 3
  • 53
  • 105
  • For LISP, be sure to read [Lisp In Small Pieces](http://pagesperso-systeme.lip6.fr/Christian.Queinnec/WWW/LiSP.html) by C.Queinnec. – Basile Starynkevitch Mar 13 '15 at 08:39
  • I've got it in my wish list, but when I'm shopping, I never want to cough up the 50USD. *Anatomy* was only 40, the other day. – luser droog Mar 13 '15 at 08:41
  • If you read French fluently, the French original version is slightly cheaper. BTW, I think Queinnec's book is worth the 50US$, but it requires some basic computer science education and a little bit of familiarity with some Lisp. – Basile Starynkevitch Mar 13 '15 at 08:46
  • That's a thought. I do read French, but my vocabulary is all from Moliere, Rostand, and Baudelaire. But, I may be ready, figuratively and financially, when I get through *Anatomy.* I also have SICP on the shelf, but I've put it down several times. There's something I find hard to read about it. dunno what. MIT envy, perhaps. – luser droog Mar 13 '15 at 09:01
1

The Tiger language has been designed by prof. Andrew Appel exactly on purpose to illustrate, step-by-step, a full compiler construct process.

You can google for 'tiger language' and read some online resource, there are also some questions/answers here on SO, but the better choice would be to get a copy of the book for the language you prefer, and implement the parts you're most interested into.

CapelliC
  • 59,646
  • 5
  • 47
  • 90