27

There are a lot of books and articles about creating compilers which do all the compilation job at a time. And what about design of incremental compilers/parsers, which are used by IDEs? I'm familiar with first class of compilers, but I have never work with the second one.

I tried to read some articles about Eclipse Java Development Tools, but they describe how to use complete infrastructure(i.e. APIs) instead of describing internal design(i.e. how it works internally).

My goal is to implement incremental compiler for my own programming language. Which books or articles would you recommend me?

Paul Sweatte
  • 24,148
  • 7
  • 127
  • 265
Ilya Lakhin
  • 1,904
  • 1
  • 16
  • 31
  • 2
    Its now a year since you asked this question and I am in a bit of the same situation. Did you find any better information? – Johannes Jun 13 '12 at 14:17
  • 2
    Johannes, sorry for the long time without answering. Unfortunately I have missed your comment. There are several different approaches. And the problem is still poor covered. There are some theoretical articles but too few implementations familiar for engineers. Nevertheless I would suggest you to look at Intellij Idea's approach: http://goo.gl/wMJjk. And especially this framework: http://goo.gl/BaK9B. – Ilya Lakhin Aug 11 '12 at 21:02

3 Answers3

14

This book is worth a look: Builing a Flexible Incremental Compiler Back-End.

Quote from Ch. 10 "Conclusions":

This paper has explored the design of the back-end of an incremental compilation system. Rather than building a single fixed incremental compiler, this paper has presented a flexible framework for constructing such systems in accordance with user needs.

I think this is what you are looking for...

Edit:
So you plan to create something that is known as a "cross compiler"?!
I started a new attempt. Until now, I can't provide the ultimate reference. If you plan such a big project, I'm sure you are an experienced programmer. Therefore it is possible, that you already know these link(s).

Compilers.net
List of certain compilers, even cross compilers (Translators). Unfortunately with some broken links, but 'Toba' is still working and has a link to its source code. May be that this can inspire you.

clang: a C language family frontend for LLVM
Ok, it's for LVVM but source is available in a SVN repository and it seems to be a front end for a compiler (translator). May be that this can inspire you as well.

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Markus
  • 4,487
  • 3
  • 41
  • 51
  • Markus, thank you for your reply. I hope it would be useful article for me. If I understand correctly it focuses on back-end aspects of design, but I was looking for front-end(and maybe middle-end too) aspects like parsing and AST construction. My own language compiler is of source-to-source class, therefore back-end aspects of its design are probably different of the similar aspects in source-to-bytecode compilers. Is it possible to read some articles which are focused on front-end aspects? – Ilya Lakhin Jun 01 '11 at 09:44
  • It was the most close answer I have received here. Also I have found interesting links in the References section of the paper. So I have marked this answer as "accepted". Thank you again! – Ilya Lakhin Jun 02 '11 at 05:12
4

I'm going to disagree with conventional wisdom on this one because most conventional wisdom makes unwritten assumptions about your goals, such as complete language designs and the need for extreme efficiency. From your question, I am assuming these goals:

  • learn about writing your own language
  • play around with your language until it looks elegant
  • try to emit code into another language or byte code for actual execution.

You want to build a hacking harness and a recursive descent parser.

Here is what you might want to build for a harness, using just a text based processor.

  1. Change the code fragment (now "AT 0700 SET HALLWAY LIGHTS ON FULL")
  2. Compile the fragment
  3. Change the code file (now "tests.l")
  4. Compile from file
  5. Toggle Lexer output (now ON)
  6. Toggle Emitter output (now ON)
  7. Toggle Run on home hardware (now OFF)

    Your command, sire?

You will probably want to write your code in Python or some other scripting language. You are optimizing your speed of play, not execution. A recursive descent parser might look like:

def cmd_at():
    if next_token.type == cTIME:
        num = next_num()
        emit("events.setAlarm(events.DAILY, converttime(" + time[0:1] + ", " 
           + time[2:] + ", func_" + num + ");")
        match_token(cTIME)
        match_token(LOCATION)
        ...

So you need to write:

  • A little menu for hacking.
  • Some lexing routines, to return different tokens for numbers, reserved words, and the like.
  • A bunch of logic for what your language

This approach is aimed at speeding up the cycle for hacking together the language. When you have finished this approach, then you reach for BISON, test harnesses, etc.

Making your own language can be a wonderful journey! Expect to learn. Do not expect to get rich.

Charles Merriam
  • 19,908
  • 6
  • 73
  • 83
  • 1
    Charles, thank you for your detailed reply. It might be a good advice for those who begin working on compiler/language design first time. But I have already done these steps. I have a complete programming language design with a prototype of compiler. Now I am going to rewrite the compiler to make it possible to implement some kind of IDE for that language. In another words I want to do "bootstrapping step". Therefore I'm interested in incremental parsing design(which is different from regular recursive descend or ascend parsing technique). And that's why I'm asking this question. :) – Ilya Lakhin Jun 02 '11 at 05:48
1

I see that there is an accepted answer, but I think that some additional material could be usefully included on this page.

I read the Wikipedia article on this topic and it linked to a DDJ article from 1997:

http://www.drdobbs.com/cpp/codestore-and-incremental-c/184410345?pgno=1

The meat of the article is the first page. It explains that the code in the editor is divided into pieces that are "incorporated" into a "CodeStore" (database). The pieces are incorporated via a work queue which contains unincorporated pieces. A piece of code may be parsed and returned to the work queue multiple times, with some failure on each attempt, until it goes through successfully. The database includes dependencies between the pieces so that when the source code is edited the effects on the edited piece and other pieces can be seen and these pieces can be reprocessed.

I believe other systems approach the problem differently. Java presents different problems than C/C++ but has advantages as well, so Eclipse perhaps has a different design.

cardiff space man
  • 1,442
  • 14
  • 31