6

I'm evaluating using Coco/R vs. ANTLR for use in a C# project as part of what's essentially a scriptable mail-merge functionality. To parse the (simple) scripts, I'll need a parser.

I've focussed on Coco/R and ANTLR because both seem fairly mature and well-maintained and capable of generating decent C# parsers.

Neither seem to be trivial to use either, however, and simplicity is something I'd appreciate - particularly maintainability by others.

Does anyone have any recommendations to make? What are the pros/cons of either for a parsing a small language - or am I looking into the wrong things entirely? How well do these integrate into a typical continuous integration setup? What are the pitfalls?

Related: Well, many questions, such as 1, 2, 3, 4, 5.

Community
  • 1
  • 1
Eamon Nerbonne
  • 47,023
  • 20
  • 101
  • 166

4 Answers4

4

We have used Coco for 2 years, having replaced Antler we were formerly using. For a typical big-data query (our application), our experience has been this. Caveat: We are dependent upon full Utf-8 handling, with the parser implemented in C++. These numbers are for a language that has some 200 EBNF productions.

  • Antler: 260 usecs/query and a 108 MEGABYTE memory footprint for the generated parser/lexer
  • Coco: 220 usecs/query and a 70 KBYTE memory footprint for the parser/scanner

Initially, Coco had a 1.2 msecs startup time and generated several 60 KBYTE tables for mapping Utf-8. We have made many local enhancements to Coco, such as to eliminate the big tables, eliminated the 1.2 msec startup time, hugely enhanced internal documentation (as well as documentation in the generated code).

Our version of (open source) Coco has a tiny footprint compared to Antlr and is very measurably faster, has no startup delay and just... works. It does not have Antler's nice UI but that never entered our mind to be an issue once we started using Coco.

  • 2
    To be a bit fair, you have to allow the OP to consider enhancing ANTLR with the same level of investment that you have made in CoCo. Given your hand-tuned "speedup" is about 10-20%, that seems within reach, and I hear that ANTLR4 is faster than ANTLR3 anyway. The 3-orders of magnitude memory footprint is a lot more interesting; did you dig into where all that space went? – Ira Baxter Jul 16 '14 at 10:30
3

ANTLR is LL(*), which is as powerful as PEG, though usually much more efficient and flexible. LL(*) degenerates to LL(k) for k>1 one arbitrary lookahead is not necessary.

Terence Parr
  • 5,912
  • 26
  • 32
  • Is it possible to avoid the use of a scanner in ANTLR? I'm a little worried about maintainability thereof because the set of viable tokens may depend on the active grammar rules (i.e. kind of like conditional keywords such as 'from' in C#). – Eamon Nerbonne Apr 29 '10 at 09:15
  • 1
    Sure. Pass in any object that implements TokenStream. – Terence Parr Jun 02 '10 at 21:48
2

If you're simply merging data into a complicated template, consider Terence Parr's StringTemplate engine. He's the man behind ANTLR. StringTemplate may be better suited and easier to use than a full parser generator. It's a very feature-rich template engine.

There is a C# port available in the downloads.

Corbin March
  • 25,526
  • 6
  • 73
  • 100
  • I saw that - you wouldn't happen to have tried it? I'm a bit leery of using a potentially poorly tested port. – Eamon Nerbonne Apr 27 '10 at 22:19
  • @Earnon Nerbonne - I've used it in a proof-of-concept project without any issues but couldn't comment on how well it's tested. Good luck. – Corbin March Apr 28 '10 at 13:39
  • This answer may not really have covered my needs - but it's certainly a starting point - and that makes it the best answer to me :-). – Eamon Nerbonne May 07 '10 at 09:20
2

Basically, coco/r generates recursive descent parsers and only supports LL(1) grammars whereas ANTLR uses back-tracking (among other techniques), which allows it to handle more complex grammars. coco/r parsers are much more light-weight and easier to understand and deploy but sometimes it's a struggle getting the grammar into a form that coco/r understands given its one look-ahead constraint - for many common programming language grammars (e.g. C++, SQL), it's not possible at all.