2

I am struggling with a grammar that involves typed expressions as well as variable access. The result type of this access is not ascertainable during parsing and is evaluated in a second step. This evaluation is not a problem, but it seems hard to write unambiguous parser rules.

All operations that work on different types (e.g. compare operators) produce a reduce/reduce conflict. Clearly, this is because the parser can not decide if "x.a = y.b" should be parsed as "bool_expr EUQAL bool_expr" or as "num_expr EUQAL num_expr" because the type is uncertain. However, the result type of the comp_op rule is certain (as it is always a boolean value).

Is there any solution to this problem without throwing all type information away during parsing and always check it in the evaluation phase?

Here is a shortened grammar example (using ocamllex and ocamlyacc):

comp_op:
  | bool_expr EQUAL bool_expr  { T.Equiv (T.Wrapper $1, T.Wrapper $3) }
  | num_expr EQUAL num_expr    { T.Equiv (T.Wrapper $1, T.Wrapper $3) }
  /* the wrapper type basically just wraps the expressions to get a common type */

bool_expr:
  | TRUE                       { T.Bool true }
  | FALSE                      { T.Bool false }
  | access                     { T.BoolAccess $1 }

num_expr:
  | NUMBER                     { T.Num $1 }
  | access                     { T.NumAccess $1 }

access:
  /* some more complex rules describing the access to variables */

Thanks for your help.

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
Fynn
  • 4,753
  • 3
  • 32
  • 69
  • Why do you want to mix parsing and typing? I guess it is better to treat this question as an example of "why not to do so". – ygrek Mar 21 '12 at 12:45

2 Answers2

2

As ygrek says, you should not try to mix parsing and typing. It's much easier to write your parser with only one syntactic category for expressions, and then to have a separate type-checking pass that will sort that out.

Theoretically, this comes from the fact that the distinctions made by typing rules are much finer than what traditional parsing technologies can express. They have been attempt at specifying typing rules more declaratively using, for example, attribute grammars, but your usual LL/LR technology is certainly not a good fit, it's like parsing nested parentheses with a regular expression.

Finally, you should use menhir instead of ocamlyacc, because it's just better. You will have more readable and expressive grammars (named parameters, parametrized rules...), better error reporting and grammar debugging features.

gasche
  • 31,259
  • 3
  • 78
  • 100
  • plus one for menhir instead of ocamlyacc, but there are many times (for example, when you're a student in a compilers course..) where you *have* to use ocamlyacc, unfortunately :-( – Kristopher Micinski Mar 21 '12 at 13:33
  • 1
    @KristopherMicinski: please send feedback to your teacher that you wish to use menhir instead, or even that (s)he should upgrade the course documents to use menhir instead. – gasche Mar 21 '12 at 13:51
  • It's my PhD advisor, not teacher..., and the reason we've used ocamlyacc is simply because we're more familiar with it for now, but I think we'll consider switching over soon enough.. – Kristopher Micinski Mar 21 '12 at 14:55
0

As already said, you will have a hard time writing a "type-correct parser" -- depending on your language this might even be impossible.

Anyway, the problem here is, that your grammar does not know the type of the "access" production; as far as I understood, this production resembles reading from variables, the type of which is unknown at parse-time. The way I see it, you either abandon the 100% type-correct parsing OR you find a way of "magically" knowing the type of your variables. You could keep track of type declarations and let the lexer look up the type of a variable it encounters; the lexer then would send a variable-identifier-token based on the type of the variable.

I'm not sure if this approach works as I don't know what your language looks like.

lambdapower
  • 1,022
  • 6
  • 12