What does a parser for C++ do until it can differentiate between comparisons and template instantiations?

Question

After reading this question I am left wondering what happens (regarding the AST) when major C++ compilers parse code like this:

struct foo 
{
  void method() { a<b>c; }

  // a b c may be declared here
};

Do they handle it like a GLR parser would or in a different way? What other ways are there to parse this and similar cases?

For example, I think it's possible to postpone parsing the body of the method until the whole struct has been parsed, but is this really possible and practical?

score 2 · Answer 1 · answered Oct 26 '18 at 16:37

2

The answer will obviously depend on the compiler, but the article How Clang handles the type / variable name ambiguity of C/C++ by Eli Bendersky explains how Clang does it. I will simply note some key points from the article:

Clang has no need for a lexer hack: the information goes in a single direction from lexer to parser
Clang knows when an identifier is a type by using a symbol table
C++ requires declarations to be visible throughout the class, even in code that appears before it
Clang gets around this by doing a full parsing/semantic analysis of the declaration, but leaving the definition for later; in other words, it's lexed but parsed after all the declared types are available

answered Oct 26 '18 at 16:37

user10564169

21
1

"Clang has no need for a lexer hack". OK, so that's an improvement over having a lexer hack, but it sounds from your description like they replaced with it with a correspondingly icky parser hack. One man's opinion. – Ira Baxter Oct 27 '18 at 16:43
Although I accepted another answer, because it is more complete on its own, the article you provided is really insightful and I recommend reading it. – panoskj Oct 28 '18 at 09:47

rici · Accepted Answer · 2018-10-27T13:13:22.463

Although it is certainly possible to use GLR techniques to parse C++ (see a number of answers by Ira Baxter), I believe that the approach commonly used in commonly-used compilers such as gcc and clang is precisely that of deferring the parse of function bodies until the class definition is complete. (Since C++ source code passes through a preprocessor before being parsed, the parser works on streams of tokens and that is what must be saved in order to reparse the function body. I don't believe that it is feasible to reparse the source code.)

It's easy to know when a function definition is complete, since braces ({}) must balance even if it is not known how angle brackets nest.

C++ is not the only language in which it is useful to defer parsing until declarations have been handled. For example, a language which allows users to define new operators with different precedences would require all expressions to be (re-)parsed once the names and precedences of operators are known. A more pathological example is COBOL, in which the precedence of OR in a = b OR c depends on whether c is an integer (a is equal to one of b or c) or a boolean (a is equal to b or c is true). Whether designing languages in this manner is a good idea is another question.

Very informative answer, though it could use some reference(s) about gcc/clang deferring the parse. — panoskj, Oct 27 '18 at 12:49
@panoskj: both compilers are open source; you can read the code yourself. I'm not going to do a full description of the parsing code here, particularly since I haven't looked at them in a while. — rici, Oct 27 '18 at 13:07
@rici: it wasn't clear you have read gcc/clang code, in which case I'll take your word for it. — panoskj, Oct 28 '18 at 09:23

What does a parser for C++ do until it can differentiate between comparisons and template instantiations?

2 Answers2