The C++ standard requires that an implementation perform lexical analysis to produce a stream of tokens, before the parsing stage. According to the lexical analysis rules, two consecutive >
characters (not followed by =
) will always be interpreted as one >>
token. The grammar provided with the C++ standard is defined in terms of these tokens.
The requirement that in certain contexts (such as when expecting a >
within a template-id) the implementation should interpret >>
as two >
is not specified within the grammar. Instead the rule is specified as a special case:
14.2 Names of template specializations [temp.names] ###
After name lookup (3.4) finds that a name is a template-name or that an operator-function-id or a literal-operator-id refers to a set of overloaded functions any member of which is a function template if this is
followed by a <
, the <
is always taken as the delimiter of a template-argument-list and never as the less-than
operator. When parsing a template-argument-list, the first non-nested >
is taken as the ending delimiter
rather than a greater-than operator. Similarly, the first non-nested >>
is treated as two consecutive but
distinct >
tokens, the first of which is taken as the end of the template-argument-list and completes the
template-id. [ Note: The second >
token produced by this replacement rule may terminate an enclosing
template-id construct or it may be part of a different construct (e.g. a cast).—end note ]
Note the earlier rule, that in certain contexts <
should be interpreted as the <
in a template-argument-list. This is another example of a construct that requires context in order to disambiguate the parse.
The C++ grammar contains many such ambiguities which cannot be resolved during parsing without information about the context. The most well known of these is known as the Most Vexing Parse, in which an identifier may be interpreted as a type-name depending on context.
Keeping track of the aforementioned context in C++ requires an implementation to perform some semantic analysis in parallel with the parsing stage. This is commonly implemented in the form of semantic actions that are invoked when a particular grammatical construct is recognised in a given context. These semantic actions then build a data structure that represents the context and permits efficient queries. This is often referred to as a symbol table, but the structure required for C++ is pretty much the entire AST.
These kind of context-sensitive semantic actions can also be used to resolve ambiguities. For example, on recognising an identifier in the context of a namespace-body, a semantic action will check whether the name was previously defined as a template. The result of this will then be fed back to the parser. This can be done by marking the identifier token with the result, or replacing it with a special token that will match a different grammar rule.
The same technique can be used to mark a <
as the beginning of a template-argument-list, or a >
as the end. The rule for context-sensitive replacement of >>
with two >
poses essentially the same problem and can be resolved using the same method.