9

What's the best way to create a parser in C++ from a file with grammar?

Szymon Lipiński
  • 27,098
  • 17
  • 75
  • 77
  • 5
    What format is the 'file with grammar' in? – CB Bailey Dec 03 '09 at 22:38
  • 2
    http://stackoverflow.com/questions/1669/learning-to-write-a-compiler is the canonical question for how to on compilers and interpreters around here. Many good links there. For a hand built recursive decent approach, look at the Crenshaw tutorial. – dmckee --- ex-moderator kitten Dec 03 '09 at 22:58

6 Answers6

18

You also might want to have a look at these links:

KeatsPeeks
  • 19,126
  • 5
  • 52
  • 83
  • I second that. The Boost documentation is really helpful. – anno Dec 04 '09 at 15:37
  • 2
    I would suggest not using `boost::spirit` if you plan on a compiler of any decent size - compile times for parsers built with `boost::spirit` tend to get very large, making even very small changes a PITA (because the whole thing is done with templates) – a_m0d Sep 02 '10 at 23:18
10

It depends heavily on the grammar. I tend to like recursive descent parsers, which are normally written by hand (though it's possible to generate one from a description of the grammar).

If you're going to use a parser generator, there are really two good choices: Byacc and Antlr. If you want something that's (reasonably) compatible with yacc, Byacc is (by far) your best choice. If you're starting from the beginning, with neither existing code nor experience that favors using something compatible with yacc, then Antlr is almost certainly your best bet.

Since it's been mentioned, I'll also talk a bit about Bison. I'd avoid Bison like the plague that it is. Brooks's advice to "Plan to throw one away" applies here. Robert Corbett (the author of Byacc) wrote Bison as his first attempt at a parser generator. Unfortunately, he gave it to GNU instead of throwing it away. In a classic case of marketing beating technical excellence, Bison is widely used (and even recommended, by those who don't know better) while Byacc remains relatively obscure.

Spirit has also been mentioned. I found early versions quite discouraging (slow compile times and even minor errors leading to a massive spew of template error messages). I've heard that newer versions have improved, but I haven't had occasion to try it again recently, so I can't really say anything meaningful about a recent version.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
8

There are flex and bison. Lex&Yacc cousins that do take c++ existence into account.

rici
  • 234,347
  • 28
  • 237
  • 341
Michael Krelin - hacker
  • 138,757
  • 24
  • 193
  • 173
3

Have you looked at Lex and Yacc ? To quote from section 5 of the linked document:

My preferred way to make a C++ parser is to have Lex generate a plain C file, and to let YACC generate C++ code. When you then link your application, you may run into some problems because the C++ code by default won't be able to find C functions, unless you've told it that those functions are extern "C".

Brian Agnew
  • 268,207
  • 37
  • 334
  • 440
2

The best way to create a parser is to use lex and yacc.

Matt
  • 74,352
  • 26
  • 153
  • 180
Dima
  • 38,860
  • 14
  • 75
  • 115
2

I've used bison, found the examples just right for my level. Was able to create a simple calculator with it, of course it can do much more.

The calculator took 1+2*3 for example and built a syntax tree. The documentation did not describe how to build the tree however and that took me a little time to work out.

If I was going again I'd look into 'antlr' as it looked good and well supported.

Martin.

martsbradley
  • 148
  • 1
  • 9