The usual way to handle this in a yacc-style LALR(1) is to have the lexer determine when an identifier is a type name by looking it up in the current symbol table and returning a different token for type names. So you end up with rules like:
declaration: decl_specs declarator_list ;
decl_specs: TYPE_NAME | ....
declarator: ID | TYPE_NAME | '*' declarator ...
expression: ID | expression '*' expression | ...
Since you can't use a name that has been declared as a type name as a value in an expression, this works adequately well for parsing C.
When trying to parse C++, this doesn't work well, both because you have explicit namespace operations (with ::
) and because you can have declarations that look like function calls. In general, in C++ you're going to need more than 1 token of lookahead to resolve some of these things, so LALR(1) parsers don't work too well.
With bison, you can use the %glr-parser
option to generate a GLR parser instead of an LALR parser. The GLR parser essentially does unlimited lookahead (trying all possibilities when there is a conflict), which can result in multiple parses. You need to put %dprec
modifiers on the rules to resolve the ambiguities in favor of one parser or the other, or %merge
directives to keep both parses around.