12

Question:

What is the Zephyr ASDL and how does it relate to other compiler technologies like lexers and parser generators?

(I would appreciate it if you were reasonably complete, but point to other references online when it gets rather technical, because most of what I know about compilers come from playing with yacc and flex, writing a simple maximal munch lexer in C, and looking up and reading stuff online)

Question Background:

I've been reading http://docs.python.org/devguide/compiler.html and I came across the following line:

The specification of the AST nodes is specified using the Zephyr Abstract Syntax Definition Language (ASDL).

I followed the citation at the bottom to find: http://www.cs.princeton.edu/research/techreps/TR-554-97.

My first reading through the article has been rather tumultuous, and I was hoping I could first get a better understanding of what the purpose of ASDL was (in context of the compilation process), before trying again.

Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
math4tots
  • 8,540
  • 14
  • 58
  • 95

3 Answers3

8

Lexer and Parser generators accept descriptions of lexemes and grammars, and generate code that implement the corresponding artifact. Lex requires a regular expression to describe tokens. Parser generators take various kinds of extended BNF notations.

The paper you reference is pretty clear IMHO: ASDL is a little language for describing abstractly a set of tree node (their types and signatures). Using this language, one can write (and the paper's authors did so) a tool that converts these descriptions into the set of record types that you'd need to implement trees to be used with a parser. So ADSL is kind of like Regexes and BNF, in that its purpose is to be fed to a code generator that produces a part of a compiler.

An expansive view is that compilers are a pretty well-understood technology, and that one ought to be able to generate them from descriptions of various pieces. Regex/BNF/ADSL are the essential keys to the parsing phase.

You'd ideally like description languages for target instruction sets, flow analyses, translations (you mentioned maximal munch) from the abstract trees to the target instruction set, and a way to describe optimizations. Then with corresponding tools for each piece, you could build the entire compiler from "specifications". There's actually been a lot of work in this area; people have done all of these separately and together. Unsurprisingly some of it comes from the "Zephyr" project formerly out of Princeton (seems the Zephyr web site there is now dead), whose goal was to do just this kind of thing.

Anyway try looking under Google Scholar for "compiler generator".

Delimitry
  • 2,987
  • 4
  • 30
  • 39
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
2

ASDL is used when you need to generate a tree in a module and input the same tree in other module (or almost the same tree, somehow optimised).

For this, you need to have functions of construction (ideally with type checker), function of printing the tree such that visualising it you are sure you generated it correctly.

ASDL takes as input some tree written in a syntax almost identical with the syntax of algebraic data type (like in haskell or ml), or the syntax in BNF but much more simplified, and auto-generates all the contructors, printing functions starting with the simple description of a tree.

For example, if you have a lexer, it will have to generate lexemes that have a type. You also need to see the output stream of lexemes (this is in linear form, so a very simple tree). Instead of writing functions for printing, constructing lexemes, you define them something like that

   lexeme=
       ID(STRING)
     | INT(num_integer)
     | FLOAT(num_float)
     attributes(int coord_x, int coord_y)
   num_integer:
     ....
   num_float:
     ....

and you call constructors ID, INT, FLOAT, etc from your lexer. ASDL will convert this simple syntax in all the functions you need, either to construct nodes for AST, or to print, or whatever you need. ASDL does not impose restrictions on the generated code.

If you add attributes to a type, such as the coordinates of a token, such attributes are appended to the parameters of each contructor from that type.

A more complex tree, created by a parser would look like that

expr:  SUM(expr, expr)
      |PRODUCT(expr, expr)
      |number
number: num_integer

In this case asdl will check that the call of SUM(_ _) made by the parser will pass to sum nodes created with one of the constructors of expr. num_integer is defined externally, maybe by an asdl tree for the lexer.

Note that you are not allowed to define constructors containing regular expressions, such as number: [0-9]+. ASDL is simpler than EBNF.

These constructors will be defined such that to build what you need and more than that, they type check, to be sure that your lexer/parser/code generator outputs trees that conform the language defined by asdl.

To well understand ASDL you need to write 3-4 parsers and see what is common in the code they generate. That common part is in fact ASDL, so this is an abstraction for the output of the parsers in particular.

alinsoar
  • 15,386
  • 4
  • 57
  • 74
0

Your question is about the difference between concrete syntax and abstract syntax.

  • Concrete syntax is the one you know and need to respect, when you use a programming language. This concrete syntax is also verified by the parser, that check whether you respect the grammar language or not. The parser has a second role, hidden to the programmer : build a dedicated data structure in memory, representative of your input source. Many algorithm will be applied onto this data structure. This data structure is named "Abstract Syntax Tree".
  • Abstract syntax : this a a set of classes (in the Object-oriented paradigm or types in the Functional paradigm) from which the AST is elaborated. For instance, you may have a class named "Program" that captures the essence of what a program is made of. You can also have a "If" class that represent the structure of a "if" statement (made of a condition, a regular body and probably a second one for the "else" part).

ASDL like Zephyr is about how to describe this set of classes for the abstract syntax, but not the grammar per se.

Once the set of classes is fully described (or even generated), it can be used within the parser, that elaborates the AST.

JCLL
  • 5,379
  • 5
  • 44
  • 64