0

there are two rules that matches a string, but if two rules can parse this string I need to use the first one!

to be specific I'm writing bison parser for C++ like language, the problem is that when I use pointer declaration like

ClassName* a;

bison parses it as multiplication operator, not deceleration ! I need to set priority to the declaration node higher then multiplication(expression).

Khaledvic
  • 534
  • 4
  • 16
  • Mind showing your EBNF?? – πάντα ῥεῖ May 07 '14 at 22:06
  • it's very simple like this statement:|expr SEMI_COMA |variable_declaration_block – Khaledvic May 07 '14 at 22:09
  • This won't solve any problem. What if the person did `foo* a` and really meant to multiply `foo` and `a`? – zneak May 07 '14 at 22:09
  • It would solve the problem since I don't want a statement to contain only a multiplication operator, if it is only a multiplication operator then it's a declaration. any way I'd like to now how it's solved in c++ ? – Khaledvic May 07 '14 at 22:12
  • 2
    C++ parsers are extremely involved programs that [cannot be expressed with Bison](http://stackoverflow.com/questions/14589346/is-c-context-free-or-context-sensitive). – zneak May 07 '14 at 22:13
  • AFAIR (but that's literally decades ago) bison tells you about conflicts in deduction of rules. You'll have to introduce some context and choose explicit precedence of rules. But I agree with @zneak, it might turn out that bison isn't suitable to write a c++ parser. You might better consider writing a [clang plugin](http://clang.llvm.org/docs/Tooling.html), depending on whatever your actual use case is. – πάντα ῥεῖ May 07 '14 at 22:13
  • @zneak +1, I hope to find a way in bison to solve this. – Khaledvic May 07 '14 at 22:20
  • @πάνταῥεῖ I tried to use %nonassoc with no luck – Khaledvic May 07 '14 at 22:21
  • @Khaledvic _'I tried to use %nonassoc with no luck'_ That's why I asked for your actual bison code snippet ... – πάντα ῥεῖ May 07 '14 at 22:23
  • @zneak: My impression is that bison can handle non-CFL languages by executing arbitrary code at various places that can direct parsing. – kec May 07 '14 at 22:29
  • 1
    We know now that language designs like C/C++ are a serious mistake (because, among other reasons, they can't be unambiguously parsed without knowing which names are types) ... why try to emulate such mistakes? – Jim Balter May 07 '14 at 22:32

2 Answers2

3

The usual way to handle this in a yacc-style LALR(1) is to have the lexer determine when an identifier is a type name by looking it up in the current symbol table and returning a different token for type names. So you end up with rules like:

declaration: decl_specs declarator_list ;
decl_specs: TYPE_NAME | ....
declarator: ID | TYPE_NAME | '*' declarator ...
expression: ID |  expression '*' expression | ...

Since you can't use a name that has been declared as a type name as a value in an expression, this works adequately well for parsing C.

When trying to parse C++, this doesn't work well, both because you have explicit namespace operations (with ::) and because you can have declarations that look like function calls. In general, in C++ you're going to need more than 1 token of lookahead to resolve some of these things, so LALR(1) parsers don't work too well.

With bison, you can use the %glr-parser option to generate a GLR parser instead of an LALR parser. The GLR parser essentially does unlimited lookahead (trying all possibilities when there is a conflict), which can result in multiple parses. You need to put %dprec modifiers on the rules to resolve the ambiguities in favor of one parser or the other, or %merge directives to keep both parses around.

Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
  • +1 on the GLR suggestion but it won't help with `namespace::something* a`; you still need to know if `namespace::something` is a type or not (but it must be already declared, so that's possible -- although if it's a template, it might be really a pain). GLR merge precedence will help with the "declarations win over function calls" problem, which is a real ambiguity, not an absence of sufficient lookahead. – rici May 08 '14 at 05:52
1

From what I understand of your problem, you're willing to give up the possibility of having statements with no obvious side effects, like arithmetic operations without assignment. The solution for you would be to redefine the statement rule to include only some expressions (namely, calls and assignments).

I'm assuming that your grammar kind of looks like this (in pseudo-Bison):

statement:
      expression ';'
    | declaration

expression:
      token
    | expression '+' expression
    | expression '-' expression
    | expression '*' expression
    | expression '/' expression
    | expression '=' expression
    | expression '(' param_list ')' // function call

Instead of making expression ';' a valid statement, you should split the expression rule to make for a set that can form a valid statement, and this set would exclude arithmetic operations.

statement:
      top_level_expression ';'
    | declaration

top_level_expression:
      expression '=' expression
    | expression '(' param_list ')' // function call

expression:
      token
    | top_level_expression
    | expression '+' expression
    | expression '-' expression
    | expression '*' expression
    | expression '/' expression
zneak
  • 134,922
  • 42
  • 253
  • 328