23

I'm trying to work on a kind of code generator to help unit-testing an legacy C/C++ blended project. I don't find any kind of independent tool can generate stub code from declaration. So I decide to build one, it shouldn't be that hard.

Please, anybody can point me a standard grammar link, better described by yacc language.

Hope I'm not reinventing wheel, please help me out in that case.

Best Regards, Kevin

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
Kevin Yu
  • 1,423
  • 4
  • 15
  • 19
  • To generate stub code from a declaration, first you have to parse the declaration. That in practice means a full C++ parser. You really don't want to do this. – Ira Baxter Jun 17 '09 at 15:10

6 Answers6

25

From the C++ FAQ Lite:

38.11 Is there a yacc-able C++ grammar?

The primary yacc grammar you'll want is from Ed Willink. Ed believes his grammar is fully compliant with the ISO/ANSI C++ standard, however he doesn't warrant it: "the grammar has not," he says, "been used in anger." You can get the grammar without action routines or the grammar with dummy action routines. You can also get the corresponding lexer. For those who are interested in how he achieves a context-free parser (by pushing all the ambiguities plus a small number of repairs to be done later after parsing is complete), you might want to read chapter 4 of his thesis.

There is also a very old yacc grammar that doesn't support templates, exceptions, nor namespaces; plus it deviates from the core language in some subtle ways. You can get that grammar here or here.

Jared Oberhaus
  • 14,547
  • 4
  • 56
  • 55
  • 2
    If you need to really parse C++, you need machinery that really works. "Not used in anger" means it doesn't work for real C++ code. (I don't understand why this answer was favorited/upvoted so many times given how completely ineffective this answer will be). – Ira Baxter Jul 03 '09 at 08:25
  • 1
    @Ira: My guess as to why it's upvoted is that there really isn't anything better. Parsing C++ is hard. – David Thornley Nov 16 '09 at 20:44
  • 1
    Ira is right. You will likely just end up wasting your time. I'm all for building your own, and plunging down the rabbit hole, if what you want to do is learn. But if you want to get a job done it is advisable to get something that works out of the box. The DMS tools have other advantages in that it covers a bunch of languages, and has additional features that you may find useful in your project. If your time is worth money (i.e. you are not doing it for fun) then the prices are reasonable. – Andre Artus Jun 21 '10 at 08:41
  • Note that the links for Willink's grammar are dead, but the grammar can currently be found at http://www.edwillink.plus.com/projects/fog/CxxGrammar.y – Michael Gaskill Feb 03 '17 at 01:14
4

I've recently found some grammar files for C++ (C++ 1998: ISO/IEC 14882:1998 and C++ 2008: ISO/IEC SC22/WG21 N2723=08-0233) at the grammarware website. The grammars are represented in Enahnced BNF, DMS BNF, BGF, SDF and Rascal notation. It's a pity, though, that the C++ grammars don't seem to get updated (no C++2003 or C++11).

3

Jared's link is the closest thing to a context-free grammar you can get. Certain things do need to be delayed for later, but that is by some arguments better than the context-sensitive grammar of C++.

To make things worse, C++1x will complexify the grammar significantly. To get as far as a perfect parse of C++, a parser will need to implement enough of the standard to correctly do overload resolution, including template argument deduction, which in turn will require the concepts mechanism, lambdas, and in effect almost all of the language, except for two-stage name lookup and exception specifications which, if I recall correctly, do not need actual implementation to parse a program successfully.

In effect, you are halfway to a compiler if you can parse C++.

coppro
  • 14,338
  • 5
  • 58
  • 73
  • If you can't do name resolution completely, you are nowhere near a C++ compiler. Parsing is much easier than name resolution. – Ira Baxter Nov 15 '09 at 11:25
  • 1
    No, because parsing requires name resolution; that's my point. C++'s grammar is that bad. – coppro Nov 16 '09 at 20:27
  • C++ parsing does NOT require name resolution if you use a GLR parser. In fact, it is is pretty easy and we do it with our DMS tool every day (www.semanticdesigns.com/Products/FrontEnds/CppFrontEnd.html). If you insist on using an LALR(1) parser that cannot tolerate local ambiguity, *then* you have to name resolve as you parse and I agree that's a mess, but then there's your reason for not doing it that way. Doing name resolution for C++ even with local ambiguities is still pretty hard, I will grant, but not nearly as nasty as when tangled with the parser. – Ira Baxter Nov 21 '09 at 03:24
  • ... and our C++ front end does all that name resolution, too. You're still nowhere near a C++ compiler: you still need flow analysis, optimizing transforms, low-level code generation, register assignment, optimization, ... – Ira Baxter Nov 21 '09 at 03:26
2

For another approach, you could consider piggy-backing on an existing compiler.

GCC-XML will "compile" C++ into XML files with a lot of useful information; it may be enough for your purposes.

Unfortunately, GCC-XML is only 1/4-maintained, and getting it to work can be...interesting. Good luck, if you go this route.

Walter Mundt
  • 24,753
  • 5
  • 53
  • 61
2

I found this one recently. I haven't tried it out, so am not sure if it works. Could you give more info on the tool you're trying to develop? I downloaded this grammar because I'm working on an instrumentation tool so I can add coverage info for my unit test framework.

After re-reading your comment...

I think this tool exactly fit your needs.

Dushara
  • 616
  • 1
  • 5
  • 25
  • I'm actually to working on something actually belong to a unit-test framework. To test a single translation unit, external reference need to be provided to produce a runnable binary, so I'm trying to parse the source code to find declarations and generate stub definition. – Kevin Yu Mar 10 '09 at 02:30
1

Our DMS Software Reengineering Toolkit can be obtained with a robust, full featured C++ parser. See http://www.semanticdesigns.com/Products/FrontEnds/CppFrontEnd.html This builds ASTs and symbol tables, and can infer the type of any expression. DMS enables one to carry out arbitrary analyses and transformations on the C++ code.

One "simple" transformation is instrumenting the code to collect test coverage data; we offer this as a COTS tool. See this paper to understand how DMS does it: http://www.semanticdesigns.com/Company/Publications/TestCoverage.pdf

EDIT September 2013 (This answer was getting a bit stale): DMS's C++ parser/name resolution/control flow analysis handles full C++11, in the ISO-, GNU- and Microsoft variants. It will also parse (and retain) source code containing most preprocessor conditionals. It has an explicit grammar driving the parsing process, unlike GCC or Clang.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • While stackoverflow does not directly favor or disfavor open source and/or free solutions, generally it is a bad idea to put a convoluted link that does not directly point to a solution. If you really want to promote your tool, at least point to a page which has some example code and dependencies that one can use without having to read corporate blurb and walled download links. – Adnan Y Feb 09 '18 at 21:25
  • OP said he wanted a grammar, but that his real problem is parsing C++ to extract information to generate stubs. My answer shows how to solve his real problem by skipping his impractical idea of getting a working grammar (they pretty much don't exist for conventional parser generators) and then doing parsing somehow without solving the name resolution problem (which is a huge amount of work). This "convoluted link" points directly to an answer that is practical. – Ira Baxter Feb 10 '18 at 07:47
  • No need to fan an opinion if you think a question is impractical. Let someone else more qualified for the question answer it or let the question go unanswered and be proven right. Secondly, I did visit that page and could not see any answer but some marketing blurb about a front end that did not have any example nor download link. If there is an actual answer, please edit the answer and add it here in case the site goes down. – Adnan Y Feb 11 '18 at 08:47
  • @Adnan: There are days when I think I'm actually qualified to answer this question. Read my bio if in doubt. Regarding the "marketing blurb": I suppose you missed the part where it explicitly says it produces control and data flow information, which is a necessary condition for OP to get his answer. – Ira Baxter Feb 12 '18 at 08:02
  • This wasn't a personal remark, and should only be taken in context to the question. I'd leave the judgement of whether your answer is a qualified one to your own personal assessment. Have a good day. – Adnan Y Feb 12 '18 at 21:45