How does a parser for C++ differentiate between comparisons and template instantiations?

Question

In C++, the symbols '<' and '>' are used for comparisons as well as for signifying a template argument. Thus, the code snippet

[...] Foo < Bar > [...]

might be interpreted as any of the following two ways:

An object of type Foo with template argument Bar
Compare Foo to Bar, then compare the result to whatever comes next

How does the parser for a C++ compiler efficiently decide between those two possibilities?

You can have a good idea of what appens by checking what is a LL grammar. ;-) — Caduchon, Feb 14 '16 at 12:00
@Caduchon AFAIK C++ grammar is not LL nor LR. In fact I believe it's not context-free at all; this is hinted in the answer where knowledge about the context seems to be necessary to correctly parse. — Bakuriu, Feb 14 '16 at 18:06
@Bakuriu: Most conventional grammars (single-nonterminal left-hand-side) are context-free by definition. C++ doesn't change this. The interpretation of the syntax is not-context free in practice for most langauges. C++ is not different is this regard. — Ira Baxter, Feb 14 '16 at 18:23
@Askaga: you are confusing "parsing" in the narrow sense with determining the meaning of the syntax, of which part is resolving the meaning of each identifier used at each point in the source code ("name resolution". — Ira Baxter, Feb 14 '16 at 18:25

score 17 · Accepted Answer · edited Jun 10 '18 at 01:36

17

If Foo is known to be a template name (e.g. a template <...> Foo ... declaration is in scope, or the compiler sees a template Foo sequence), then Foo < Bar cannot be a comparison. It must be a beginning of a template instantiation (or whatever Foo < Bar > is called this week).

If Foo is not a template name, then Foo < Bar is a comparison.

In most cases it is known what Foo is, because identifiers generally have to be declared before use, so there's no problem to decide one way or the other. There's one exception though: parsing template code. If Foo<Bar> is inside a template, and the meaning of Foo depends on a template parameter, then it is not known whether Foo is a template or not. The language standard directs to treat it as a non-template unless preceded by the keyword template.

The parser might implement this by feeding context back to the lexer. The lexer recognizes Foo as different types of tokens, depending on the context provided by the parser.

edited Jun 10 '18 at 01:36

Dave

4,282
2
19
24

answered Feb 14 '16 at 11:49

n. m. could be an AI

112,515
14
128
243

To add to this answer: when compiler does not have the context, e.g. when `Foo` is a dependent name, you need to tell compiler that it is a template: `template Foo` – Revolver_Ocelot Feb 14 '16 at 11:50
I see. So since in C++ everything has to be declared before it is used, the lexer and parser stages can be interlinked, and the parser constantly feeds back the current type and variable names to the lexer so it can generate the most probable token, either *identifier* or *type identifier*, right? And in case it is undecidable, the programmer has to prefix the *type identifier* with the "template" keyword. – Askaga Feb 14 '16 at 12:10
1

There actually at least three kinds of identifier the lexer typically distinguishes: 'template', 'typedef/class name' and 'normal identifier'. There are two different prefixes for the case it is unknown: `template` and `typename`. – n. m. could be an AI Feb 14 '16 at 12:27
1

`Foo` is a *template-id*. – T.C. Feb 15 '16 at 00:55

Mikhail Maltsev · Answer 2 · 2016-02-14T12:07:55.500

The important point to remember is that C++ grammar is not context-free. I.e., when the parser sees Foo < Bar (in most cases) knows that Foo refers to a template definition (by looking it up in the symbol table), and thus < cannot be a comparison.

There are difficult cases, when you literally have to guide the parser. For example, suppose that are writing a class template with a template member function, which you want to specialize explicitly. You might have to use syntax like:

 a->template foo<int>();

(in some cases; see Calling template function within template class for details)

Also, comparisons inside non-type template arguments must be surrounded by parentheses, i.e.:

foo<(A > B)>

not

foo<A > B>

Non-static data member initializers bring more fun: http://open-std.org/JTC1/SC22/WG21/docs/cwg_active.html#325

great point. It is a bit an off-topic question. Let's say the C++ designers or designers of a new language decide using another symbol for template parameters rather than `< >`. The candidates `( )`, `[ ]` and `{ }` are used for the other purposes. Would there be any other alternative for a language? — barej, Jan 10 '19 at 11:54

score 5 · Answer 3 · answered Feb 14 '16 at 11:52

C and C++ parsers are "context sensitive", in other words, for a given token or lexeme, it is not guaranteed to be distinct and have only one meaning - it depends on the context within which the token is used.

So, the parser part of the compiler will know (by understanding "where in the source it is") that it is parsing some kind of type or some kind of comparison (This is NOT simple to know, which is why reading the source of competent C or C++ compiler is not entirely straight forward - there are lots of conditions and function calls checking "is this one of these, if so do this, else do something else").

The keyword template helps the compiler understand what is going on, but in most cases, the compiler simply knows because < doesn't make sense in the other aspect - and if it doesn't make sense in EITHER form, then it's an error, so then it's just a matter of trying to figure out what the programmer might have wanted - and this is one of the reasons that sometimes, a simple mistake such as a missing } or template can lead the entire parsing astray and result in hundreds or thousands of errors [although sane compilers stop after a reasonable number to not fill the entire universe with error messages]

score 2 · Answer 4 · edited May 23 '17 at 12:02

Most of the answers here confuse determining the meaning of the symbol (what I call "name resolution") with parsing (defined narrowly as "can read the syntax of the program").

You can do these tasks separately..

What this means is that you can build a completely context-free parser for C++ (as my company, Semantic Designs does), and leave the issues of deciding what the meaning of the symbol is to a explicitly seperate following task.

Now, that task is driven by the possible syntax interpretations of the source code. In our parsers, these are captured as ambiguities in the parse.

What name resolution does is collect information about the declarations of names, and use that information to determine which of the ambiguous parses doesn't make sense, and simply drop those. What remains is a single valid parse, with a single valid interpretation.

The machinery to accomplish name resolution in practice is a big mess. But that's the C++ committee's fault, not the parser or name resolver. The ambiguity removal with our tool is actually done automatically, making that part actually pretty nice but if you don't look inside our tools you would not appreciate that, but we do because it means a small engineering team was able to build it.

See an example of resolution of template-vs-less than on C++s most vexing parse done by our parser.

How does a parser for C++ differentiate between comparisons and template instantiations?

4 Answers4

Linked