ASDL is used when you need to generate a tree in a module and input the same tree in other module (or almost the same tree, somehow optimised).
For this, you need to have functions of construction (ideally with type checker), function of printing the tree such that visualising it you are sure you generated it correctly.
ASDL takes as input some tree written in a syntax almost identical with the syntax of algebraic data type (like in haskell or ml), or the syntax in BNF but much more simplified, and auto-generates all the contructors, printing functions starting with the simple description of a tree.
For example, if you have a lexer, it will have to generate lexemes that have a type. You also need to see the output stream of lexemes (this is in linear form, so a very simple tree). Instead of writing functions for printing, constructing lexemes, you define them something like that
lexeme=
ID(STRING)
| INT(num_integer)
| FLOAT(num_float)
attributes(int coord_x, int coord_y)
num_integer:
....
num_float:
....
and you call constructors ID, INT, FLOAT, etc from your lexer. ASDL will convert this simple syntax in all the functions you need, either to construct nodes for AST, or to print, or whatever you need. ASDL does not impose restrictions on the generated code.
If you add attributes
to a type, such as the coordinates of a token, such attributes are appended to the parameters of each contructor from that type.
A more complex tree, created by a parser would look like that
expr: SUM(expr, expr)
|PRODUCT(expr, expr)
|number
number: num_integer
In this case asdl will check that the call of SUM(_ _) made by the parser will pass to sum nodes created with one of the constructors of expr. num_integer
is defined externally, maybe by an asdl tree for the lexer.
Note that you are not allowed to define constructors containing regular expressions, such as number: [0-9]+
. ASDL is simpler than EBNF.
These constructors will be defined such that to build what you need and more than that, they type check, to be sure that your lexer/parser/code generator outputs trees that conform the language defined by asdl.
To well understand ASDL you need to write 3-4 parsers and see what is common in the code they generate. That common part is in fact ASDL, so this is an abstraction for the output of the parsers in particular.