3

I'm trying to create grammar for GNU MathProg language from glpk package https://www3.nd.edu/~jeff/mathprog/glpk-4.47/doc/gmpl.pdf
Unfortunately grammar I've written so far is ambiguous. I don't know how to tell bison which branch of parsing tree is correct when some identifier is used. For example:

numericExpression : numericLiteral
              | identifier
              | numericFunctionReference
              | iteratedNumericExpression
              | conditionalNumericExpression
              | '(' numericExpression ')' %prec PARENTH
              | '-' numericExpression %prec UNARY
              | '+' numericExpression %prec UNARY
              | numericExpression binaryArithmeticOperator numericExpression
              ;

symbolicExpression : stringLiteral
               | symbolicFunctionReference
               | identifier
               | conditionalSymbolicExpression
               | '(' symbolicExpression ')'  %prec PARENTH
               | symbolicExpression '&' symbolicExpression
               ;

indexingExpression : '{' indexingEntries '}'
               | '{' indexingEntries ':' logicalExpression '}'
               ;

setExpression : literalSet
          | identifier
          | aritmeticSet
          | indexingExpression
          | iteratedSetExpression
          | conditionalSetExpression
          | '(' setExpression ')'  %prec PARENTH
          | setExpression setOperator setExpression
          ;


numericLiteral : INT
           | FLT
           ;

 linearExpression : identifier
             | iteratedLinearExpression
             | conditionalLinearExpression
             | '(' linearExpression ')'  %prec PARENTH
             | '-' linearExpression  %prec UNARY
             | '+' linearExpression  %prec UNARY
             | linearExpression '+' linearExpression
             | linearExpression '-' linearExpression
             | linearExpression '*' numericExpression
             | numericExpression '*' linearExpression
             | linearExpression '/' numericExpression
             ;

logicalExpression : numericExpression
              | relationalExpression
              | iteratedLogicalExpression
              | '(' logicalExpression ')'  %prec PARENTH
              | NOT logicalExpression %prec NEG
              | logicalExpression AND logicalExpression
              | logicalExpression OR logicalExpression
              ;

identifier : SYMBOLIC_NAME
       | SYMBOLIC_NAME '[' listOfIndices ']'
       ;

listOfIndices : SYMBOLIC_NAME
    | listOfIndices ',' SYMBOLIC_NAME
    ;

Identifier is simply name of 'variable'. Variable has a specific type (parameter, set, decision variable) and might be indexed. In code programmer has to declare variable type in statements like eg.

param p1;
param p2{1, 2} >=0;
set s1;
set s2{i in 1..5};
var v1 >=0;
var v2{S1,S2};

But when bison sees identifier doesn't know which rule should use and i'm getting reduce/reduce conflicts like

113 numericExpression: identifier .
123 symbolicExpression: identifier .

'&'          reduce using rule 123 (symbolicExpression)
ELSE         reduce using rule 113 (numericExpression)
ELSE         [reduce using rule 123 (symbolicExpression)]
INTEGER      reduce using rule 113 (numericExpression)
INTEGER      [reduce using rule 123 (symbolicExpression)]
BINARY       reduce using rule 113 (numericExpression)
BINARY       [reduce using rule 123 (symbolicExpression)]
ASIGN        reduce using rule 113 (numericExpression)
ASIGN        [reduce using rule 123 (symbolicExpression)]
','          reduce using rule 113 (numericExpression)
','          [reduce using rule 123 (symbolicExpression)]
'>'          reduce using rule 113 (numericExpression)
'>'          [reduce using rule 123 (symbolicExpression)]
'}'          reduce using rule 113 (numericExpression)
'}'          [reduce using rule 123 (symbolicExpression)]

113 numericExpression: identifier .
123 symbolicExpression: identifier .
130 setExpression: identifier .

UNION           reduce using rule 130 (setExpression)
DIFF            reduce using rule 130 (setExpression)
SYMDIFF         reduce using rule 130 (setExpression)
ELSE            reduce using rule 113 (numericExpression)
ELSE            [reduce using rule 123 (symbolicExpression)]
ELSE            [reduce using rule 130 (setExpression)]
WITHIN          reduce using rule 130 (setExpression)
IN              reduce using rule 113 (numericExpression)
IN              [reduce using rule 123 (symbolicExpression)]

I have also other problems but this one is blocker for me

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Lisu
  • 45
  • 4
  • Can you also show your parser? I'm trying to read through your grammar and keep looking for "param" "var" strings –  May 01 '16 at 15:01
  • here is entire bison file. Keep in mind it is not finshed yet. There are some operators precedences missing and some tabular rules. http://wklej.org/id/2370457/ – Lisu May 01 '16 at 15:37
  • I feel like branching out at expressions semantically rather than categorically tends to produce problems like that. Wouldn't having "numericExpression" and "logicalExpression" (for example) be sufficient? both of those often evaluate to the same thing: "expression". You use those more specific expressions to just validate. –  May 01 '16 at 16:00
  • As a general rule, trying to do typechecking in the grammar is a bad idea. It's generally much better to write a general grammar that can parse any expression of any (combination of) types, and then have a separate typechecker that checks for inconsistent types. This is easier to do, easier to understand, and gives much better error messages. – Chris Dodd May 01 '16 at 18:58

1 Answers1

1

Basically the problem is that identifier appears in multiple rules:

numericExpression : identifier
symbolicExpression : identifier
setExpression: identifier

which may apply in the same context. One way to resolve this is to introduce different token types for set names and scalar (parameters and variables) names:

symbolicExpression : SCALAR_NAME
setExpression: SET_NAME

This will resolve conflict with set names. I don't think that you need this rule

numericExpression : identifier

because there is an automatic conversion from strings to numbers in AMPL and therefore MathProg, which is a subset of AMPL, so symbolicExpression should be allowed in the context of numericExpression.

Note that the grammar is not context-free, so you'll need to pull additional information like the name category above from the symbol table to resolve problems like this.

My flex is a bit rusty but I think you can do something like that in an identifier rule:

return is_setname(...) ? TOK_SET_NAME : TOK_SCALAR_NAME;

where is_setname is a function to check whether the current identifier is a set. You'll need to define such function and get the necessary information from a symbol table.

vitaut
  • 49,672
  • 25
  • 199
  • 336
  • How would You distinguish SCALAR_NAME and SET_NAME. Both of them are simply generated by flex regular expression > SYMBOLIC_NAME [a-zA-Z_][a-zA-Z0-9_]* Maybe i should do some pre-lex analysis and add prefixes like param_name1, var_name2 – Lisu May 01 '16 at 20:49
  • I think you should be able to do this in a flex rule by returning an appropriate token. I've updated the answer. – vitaut May 01 '16 at 21:20