ANTLR or SableCC for DSL Implementation?

Question

Has somebody used both for Language implementation and is able to compare them, pointing out strengths and weaknesses? I seek a RAD tool with support for AST Walker Code generation. SableCC is LALR and thus supports ´Left recursion´, whereas ANTLR is LL(*). Is this important for typical grammars or DSLs? I need to perform some domain-specific analysis as well. (The target language of my compiler will be OpenCL C). As this will be for a student project it is important that I do not lose that much time on the tedious side, that is implementing the Front-End of the language.

If you are targeting OpenCL, won't you have to know a lot about the data types of the operands in your DSL? This implies symbol tables and type inference, that neither ANTLR nor SableCC provide any specific support. I'm just observing you should choose your DSL tool considering what mechanisms it needs to provide, above and beyond just parsing. If *all* you need is parsing (how do you know??) then either of these will like be a fine choice. — Ira Baxter, Dec 15 '11 at 21:50
The only data-types supported will be single-precision floating point numbers as integer arithmetic is not nearly as fast on most GPU architectures. I still have to do a bit of reading work on the domain (Neuron models) but primarily I think I will have to walk a tree. I have heard SableCC generates classes implementing the Visitor design pattern. On the other hand ANTLR has a >>> user community and I can get better tutorials, books and documentation. — Matthias Hueser, Dec 16 '11 at 02:20
If its a GPU, you'll need (multidiemnsional) arrays (implict or explicit) of whatever datatype you think you are supporting, or you won't be able to get the computational horsepower. If it is a good DSL, it won't be poisoned by the implementation's insistence on single precision floats, and so I'd expect you have have ints, and various precision floats appropriate for the *problem* not the GPU. If you're a student, you can do anything you like. — Ira Baxter, Dec 16 '11 at 04:55
... as far as code generation is concerned, if you parse into a tree, you don't have any choice but to walk it somewhere before the code gets generated. The real problem is, how do you generate efficient code for the GPU, given your DSL specification? Unless your DSL domain trivally matches what the GPU does, you'll have to collect information from different parts of the tree to decide what to generate for each tree node. So a simple "linear" tree walk likely won't do the trick. You need to read about optimizing compilers, or give up on the idea of good code generation. — Ira Baxter, Dec 16 '11 at 05:04

score 1 · Answer 1 · answered Jul 30 '12 at 20:41

I cannot say much about ANTLR, but maybe some information about SableCC.

Design

It generate a parser, which generated code and hand-written code are clean separated using Visitor pattern, and integrates the transform from Concrete Syntax Tree to Abstract Syntax Tree. As a result the designer can get a AST after the parser parses successful the input, and he can walk through the tree and make action on corresponding nodes.

The designer can first write and debug his grammar, try to optimize the transform from Concrete Syntax Tree to Abstract Syntax Tree. After he has a solid AST he can write action code in separated class. So the designer write grammar only once and can write more type of action for the grammar, for example once for Syntax Highlight, once for Semantic analysis and code generator. I have done it in a productive system. It works very well.

With ANTLR the designer can construct the AST tree by adding action code in grammar t generate the AST, then reuses it for different manner. But there a not a clean separation between generated code and hand-written code.

An other aspect maybe support of IDE. Since with SableCC you have separated code, you can easy use auto-complete function of IDE.

grammar

SableCC is a LR(1) parser generator, so it is IMO easier to write grammar for ANTLR, which is a LL(k) parser generator, (without trick). I think (aber not sure) SableCC is the only one LR(1) java parser generator, which is so popular.

output parser

ANTLR can generate parser in many languages, while SableCC can only generate parser in Java (mainstream). There some plugin / adapter to generate parser in other language, however according to the author (http://www.mare.ee/indrek/sablecc/) they are too old. SableCC 4 can generate more, but it is in beta, which is not recommend for serious project.

Development Support

ANTLR hat a IDE to write grammar. It is ANTLRWorks, which can visual grammar, navigate in source (like jump to definition of token or production). SableCC hat no such tools. There are primitive Syntax Highlight script for VIM and a poor feature plugin for Netbeans.

Conclusion

IMO I think for big project, required long term maintenance SableCC is more suitable than ANTLR.

Martin Fowler has a informative about SableCC, you can find it here. http://martinfowler.com/bliki/HelloSablecc.html

I think the answer is, "not just a parser generator". If you are serious about parsing *and analyzing* your DSL code, you will find that choosing a "mere parsing" engine does not help you very much. You needs lot more mechanism, and just like the parser generator, you really don't want to build all the mechanism yourself. See my essay on Life After Parsing: http://www.semanticdesigns.com/Products/DMS/LifeAfterParsing.html — Ira Baxter, Jul 30 '12 at 20:56
yes, I agree. Parsing is just the begin. There are too much to do after once has a AST. So I would like to have an automatically generated AST than just write it myself. I think so. I'm not a parser and compiler expert, just an amateur. — lazyboy, Jul 30 '12 at 21:01
Neither SABLE nor ANTLR to my understanding automatically generates an AST; you seem to imply SABLE helps somewhat. Our DMS does generate ASTs automatically; grammar in, parser + AST out. — Ira Baxter, Jul 30 '12 at 21:06
So, please tell me how can I try your DMS, have you a evaluation licence or trial version or something like it? In SableCC the designer can tell it, which / how to generate AST from CST using rule. It generates then automatically the AST for him. So he can tell SableCC, which token he would like to keep in AST (for debug also), which Production must be transformed to which abstract production. On my demand, it is all what I need. As long as I cannot try the DMS, I cannot sure if the AST is really the AST which I need. — lazyboy, Jul 30 '12 at 21:26
See http://stackoverflow.com/a/6378997/120163 for a sample AST. Contact the company to discuss evaluation copies. — Ira Baxter, Jul 30 '12 at 21:29
Thanks for your answer, just so compilcate for me (see my nickname ;)) — lazyboy, Jul 30 '12 at 21:34

ANTLR or SableCC for DSL Implementation?

1 Answers1