Is it possible to parse C++11 with the Bison gLR option and the scanner hack?

Question

I am interested in the evolution of programming languages and the likely directions this will take the in the future. C++ is of particular interest as a widely-used language that puts a heavy burden on compiler writers in order to give users a richly featured language with less "obnoxious" grammar.

My impression is that the most widely used C++11 parsers use some variation of recursive descent. I am curious if anyone has built a C++11 parser using the Bison gLR option and the scanner hack. I've taken a couple of stabs at it myself but find that it's tough to specify the right disambiguation rules for the reduce/reduce and shift/reduce conflicts that inevitably come up with any grammar resembling the published grammar.

"it's tough to specify the right disambiguation rules". Which is probably exactly the reason you don't see many Bison grammars for C++11 around. — n. m. could be an AI, Sep 13 '17 at 21:03
Yes, it's possible to parse C++ with a GLR parser, and yes, it has been done. See the last paragraph in Ira Baxter's answer here: https://stackoverflow.com/questions/243383/why-cant-c-be-parsed-with-a-lr1-parser/1004737. That's not bison, of course. But note that "parsing" is just a small part of analyzing C++ code. First you need to preprocess and afterwards (or during) you need to deduce templates. — rici, Sep 13 '17 at 21:26
I've already read what Ira Baxer wrote, and I'm not sure s/he actually said what you think s/he said. The tool is not Bison, it does not use the scanner hack, and it does not return a single parse tree based on precedence rules. It returns a collection of possible parse trees that are disambiguated later. — Kent G. Budge, Sep 13 '17 at 23:22
@kent: i said that its not bison. When the disambiguation happens doesn't seem to me relevant; bison's glr parser provides a dynamic merging mechanism (necessary to parse c++) which can either disambiguate immediately or collect alternatives for later. It could consult the symbol table to disambiguate, obviating the need for the "scanner hack". In any event, the most difficult disambiguation is related to template deduction, and that's not amenable to the scanner hack. — rici, Sep 14 '17 at 00:03
@rici: Okay, that's very helpful. I have been using bison in LALR(1) mode for decades, but am less familiar with the newer glr capability and did not appreciate the power of the dynamic merging mechanism. I was experimenting with the %dprec mechanism and it has significant limitations for deep ambiguities. — Kent G. Budge, Sep 14 '17 at 00:30
@Kent: At the end of [this answer](https://stackoverflow.com/a/14589567/1566221) there is an example of a C++ parse which requires template deduction. The punchline is `auto b = foo>::typen<1>(); // Syntax error if not prime`... — rici, Sep 14 '17 at 16:17
although it could also have been written as the choice of two parses by using `auto b = foo>::typen<1>(0);`: depending on whether `typen` is a static template function member or a static intenger member, the expression is either a constructor invocation or a slightly odd arithmetic operation which compares the result of a comparison with 0. And what `typen` is depends in turn on the (compile-time constant) value of `IsPrime<234799>` — rici, Sep 14 '17 at 16:17
Yes, I've seen that clever example before. gLR is supposed to be able to parse any context-free grammar and let you resolve ambiguities; but since C++ is not a context-free language if you use the scanner hack, I was curious if it was known for sure that a Bison-generated gLR parser using the scanner hack could do the job correctly. The answer I'm hearing is that you can parse C++ with gLR by discarding the scanner hack, since it then really is a context-free grammar you're parsing, but at the cost of having to do a lot of post-parsing disambiguation. Which shouldn't bother me. — Kent G. Budge, Sep 15 '17 at 20:55

Is it possible to parse C++11 with the Bison gLR option and the scanner hack?

0 Answers0