0

I want to implement the Parser for proposition logic which has the following operators in decreasing order of precedence:

  1. NOT p
  2. p AND q
  3. p OR q
  4. IF p THEN q
  5. p IFF q
  6. IF p THEN q ELSE r

The main issue is with the IF-THEN-ELSE operator. Without it, I am able to write the grammar properly. Presently my yacc file looks like

%term
    PARSEPROG | AND | NOT | OR | IF | THEN | ELSE | IFF | LPAREN | RPAREN | ATOM of string | SEMICOLON | EOF

%nonterm
    start of Absyn.program | EXP of Absyn.declaration

%start start
%eop EOF SEMICOLON
%pos int
%verbose

%right ELSE
%right IFF
%right THEN
%left AND OR
%left NOT

%name Fol

%noshift EOF

%%

start : PARSEPROG EXP (Absyn.PROGRAM(EXP))


EXP: ATOM ( Absyn.LITERAL(ATOM) )
    | LPAREN EXP RPAREN (EXP)
    | EXP AND EXP ( Absyn.CONJ(EXP1, EXP2) )
    | EXP OR EXP ( Absyn.DISJ(EXP1, EXP2) )
    | IF EXP THEN EXP ELSE EXP ( Absyn.IFTHENELSE(EXP1, EXP2, EXP3) )
    | IF EXP THEN EXP ( Absyn.IMPLI(EXP1, EXP2) )
    | EXP IFF EXP ( Absyn.BIIMPLI(EXP1, EXP2) )
    | NOT EXP ( Absyn.NEGATION(EXP) )

But I don't seem to get the correct idea how to eliminate reduce-shift conflicts. Some examples of correct parsing are:

  1. IF a THEN IF b THEN c________a->(b->c)
  2. IF a THEN IF b THEN c ELSE d IFF e OR f_______IFTHENELSE(a,b->c,d<=>e/\f)

Any help/pointers will be really helpful. Thanks.

Deepak Saini
  • 2,810
  • 1
  • 19
  • 26
  • See the second part of [this answer](https://stackoverflow.com/questions/32821775/bison-shift-reduce-conflict-reduce-reduce-conflict-warnings/32826695) Or search fir "dangling else" – rici Sep 08 '17 at 02:32
  • The answer mentions how to match `IF a THEN IF b THEN c ELSE d` to `IF a THEN IFTHENELSE(b,c,d)` but I want it to be matched to `IFTHENELSE(a,IFTHEN(b,c),d)` – Deepak Saini Sep 08 '17 at 02:56
  • That parse cannot be found by an LR(k) parser, since it is impossible to know whether to reduce `IF b THEN c` until the second following `ELSE` is encountered (or not), which means that it cannot be predicted with finite lookahead. – rici Sep 08 '17 at 03:26
  • However, an unambiguous grammar does exist, which you could parse with a GLR parser. See https://cs.stackexchange.com/a/68880/4416 – rici Sep 08 '17 at 03:32
  • Yeah, Got it. Then I guess the parsing can't be done in ML-yacc as it is an LALR parser. Anyways, thanks a lot. – Deepak Saini Sep 08 '17 at 03:56
  • Try giving `ELSE` a *higher* precedence than `THEN`. You have the precedence the wrong way around. When `ELSE` is seen, we want to shift that token, so that it goes with the immediately preceding `IF`. – Kaz Oct 19 '17 at 23:04
  • @rici The class if/then/else problem can certainly be parsed by LALR(1), and will be in fact handled right by default in any implementation that resolves shift/reduce in favor of shift (like Yacc). In Yacc, you can make the `else` token have higher precedence than `then`, which will get rid of the diagnostic. Bison Manual: https://www.gnu.org/software/bison/manual/bison.html#Non-Operators – Kaz Oct 19 '17 at 23:08
  • @kaz: You are misreading the question, I'm afraid. I did too, at first. See Deepak's first comment, above. They don't want the normal resolution. – rici Oct 19 '17 at 23:16
  • @rici That's a stupid requirement, since it's the other way in every other mainstream computer language. Users will be confused and hate it. Giving ELSE a low precedence has no practical value; it's just an exercise in making your Yacc sit up and beg. – Kaz Oct 20 '17 at 14:57
  • @rici I had stared at the examples in the question and comments and just didn't see they were going for the unusual parse! – Kaz Oct 20 '17 at 16:01
  • @kaz, anyway, inverting the precedence of ELSE can only make else clauses impossible to parse; it cannot make them match a different IF. The unambiguous grammar for the outermost parse which I proposed was in an answer to the [cs.se] site, which deals with CS theory, not necessarily related to practical programming. – rici Oct 20 '17 at 16:07
  • @rici Sorry about the underestimation, rici! – Kaz Oct 20 '17 at 16:09
  • @rici I just posted a complete solution based on the very simple trick of over-generating, and then using a simple, localized semantic check on the AST to constrain the syntax. – Kaz Oct 20 '17 at 19:15

2 Answers2

1

Making my Yacc sit up and beg

I'm more convinced than ever that the correct approach here is a GLR grammar, if at all possible. However, inspired by @Kaz, I produced the following yacc/bison grammar with an LALR(1) grammar (not even using precedence declarations).

Of course, it cheats, since the problem cannot be solved with an LALR(1) grammar. At appropriate intervals, it walks the constructed tree of IF THEN and IF THEN ELSE expressions, and moves the ELSE clauses as required.

Nodes which need to be re-examined for possible motion are given the AST nodetype IFSEQ and the ELSE clauses are attached with the traditional tightest match grammar, using a classic matched-if/unmatched-if grammar. A fully-matched IF THEN ELSE clause does not need to be rearranged; the tree rewrite will apply to the expression associated with the first ELSE whose right-hand operand is unmatched (if there is one). Keeping the fully-matched prefix of an IF expression separate from the tail which needs to be rearranged required almost-duplicating some rules; the almost-duplicated rules differ in that their actions directly produce TERNARY nodes instead if IFSEQ nodes.

In order to correctly answer the question, it would also be necessary to rearrange some IFF nodes, since the IFF binds more weakly than the THEN clause and more tightly than the ELSE clause. I think this means:

IF p THEN q IFF IF r THEN s  ==>  ((p → q) ↔ (r → s))
IF p THEN q IFF r ELSE s IFF t ==> (p ? (q ↔ r) : (s ↔ t))
IF p THEN q IFF IF r THEN s ELSE t IFF u ==> (p ? (q ↔ (r → s)) : (t ↔ u))

although I'm not sure that is what is being asked for (particularly the last one) and I really don't think it's a good idea. In the grammar below, if you want IFF to apply to an IF p THEN q subexpression, you will have to use parentheses; IF p THEN q IFF r produces p → (q ↔ r) and p IFF IF q THEN r is a syntax error.

Frankly, I think this whole thing would be easier using arrows for conditionals and biconditionals (as in the glosses above), and using IF THEN ELSE only for ternary selector expressions (written above with C-style ? : syntax, which is another possibility). That will generate far fewer surprises. But it's not my language.

One solution for the biconditional operator with floating precedence would be to parse in two passes. The first pass would only identify the IF p THEN q operators without an attached ELSE, using a mechanism similar to the one proposed here, and change them to p -> q by deleting the IF and changing the spelling of THEN. Other operators would not be parsed and parentheses would be retained. It would then feed to resulting token stream into a second LALR parser with a more traditional grammar style. I might get around to coding that only because I think that two-pass bison parsers are occasionally useful and there are few examples floating around.

Here's the tree-rewriting parser. I apologise for the length:

%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void yyerror(const char* msg);
int yylex(void);

typedef struct Node Node;
enum AstType { ATOM, NEG, CONJ, DISJ, IMPL, BICOND, TERNARY,
               IFSEQ
};
struct Node {
  enum AstType type;
  union {
    const char* atom;
    Node* child[3];
  };
};
Node* node(enum AstType type, Node* op1, Node* op2, Node* op3);
Node* atom(const char* name);
void  node_free(Node*);
void  node_print(Node*, FILE*);

typedef struct ElseStack ElseStack;
struct ElseStack {
  Node* action;
  ElseStack* next;
};

ElseStack* build_else_stack(Node*, ElseStack*);
ElseStack* shift_elses(Node*, ElseStack*);
%}

%union {
  const char* name;
  struct Node* node;
}

%token <name> T_ID
%token T_AND  "and"
       T_ELSE "else"
       T_IF   "if"
       T_IFF  "iff"
       T_NOT  "not"
       T_OR   "or"
       T_THEN "then"
%type <node> term conj disj bicond cond mat unmat tail expr

%%
prog : %empty | prog stmt;
stmt : expr '\n'       { node_print($1, stdout); putchar('\n'); node_free($1); }
     | '\n'
     | error '\n'
term : T_ID            { $$ = atom($1); }
     | "not" term      { $$ = node(NEG, $2, NULL, NULL); }
     | '(' expr ')'    { $$ = $2; }
conj : term
     | conj "and" term { $$ = node(CONJ, $1, $3, NULL); }
disj : conj
     | disj "or" conj  { $$ = node(DISJ, $1, $3, NULL); }
bicond: disj
     | disj "iff" bicond { $$ = node(BICOND, $1, $3, NULL); }
mat  : bicond
     | "if" expr "then" mat "else" mat
                       { $$ = node(IFSEQ, $2, $4, $6); }
unmat: "if" expr "then" mat
                       { $$ = node(IFSEQ, $2, $4, NULL); }
     | "if" expr "then" unmat
                       { $$ = node(IFSEQ, $2, $4, NULL); }
     | "if" expr "then" mat "else" unmat
                       { $$ = node(IFSEQ, $2, $4, $6); }
tail : "if" expr "then" mat
                       { $$ = node(IFSEQ, $2, $4, NULL); }
     | "if" expr "then" unmat
                       { $$ = node(IFSEQ, $2, $4, NULL); }
cond : bicond
     | tail            { shift_elses($$, build_else_stack($$, NULL)); }
     | "if" expr "then" mat "else" cond
                       { $$ = node(TERNARY, $2, $4, $6); }
expr : cond

%%

/* Walk the IFSEQ nodes in the tree, pushing any
 * else clause found onto the else stack, which it
 * returns. 
 */
ElseStack* build_else_stack(Node* ifs, ElseStack* stack) {
  if (ifs && ifs->type != IFSEQ) {
    stack = build_else_stack(ifs->child[1], stack);
    if (ifs->child[2]) { 
      ElseStack* top = malloc(sizeof *top);
      *top = (ElseStack) { ifs->child[2], stack };
      stack = build_else_stack(ifs->child[2], top);
    }
  }
  return stack;
}
/* Walk the IFSEQ nodes in the tree, attaching elses from
 * the else stack.
 * Pops the else stack as it goes, freeing popped 
 * objects, and returns the new top of the stack.
 */
ElseStack* shift_elses(Node* n, ElseStack* stack) {
  if (n && n->type == IFSEQ) {
    if (stack) {
      ElseStack* top = stack;
      stack = shift_elses(n->child[2],
                          shift_elses(n->child[1], stack->next));
      n->type = TERNARY;
      n->child[2] = top;
      free(top);
    }
    else {
      shift_elses(n->child[2],
                  shift_elses(n->child[1], NULL));
      n->type = IMPL; 
      n->child[2] = NULL;
    }
  }
  return stack;
}
  
Node* node(enum AstType type, Node* op1, Node* op2, Node* op3) {
  Node* rv = malloc(sizeof *rv);
  *rv = (Node){type, .child = {op1, op2, op3}};
  return rv;
}

Node* atom(const char* name) {
  Node* rv = malloc(sizeof *rv);
  *rv = (Node){ATOM, .atom = name};
  return rv;
}

void node_free(Node* n) {
  if (n) {
    if (n->type == ATOM) free((char*)n->atom);
    else for (int i = 0; i < 3; ++i) node_free(n->child[i]);
    free(n);
  }
}

const char* typename(enum AstType type) {
  switch (type) {
    case ATOM:    return "ATOM";
    case NEG:     return "NOT" ;
    case CONJ:    return "CONJ";
    case DISJ:    return "DISJ";
    case IMPL:    return "IMPL";
    case BICOND:  return "BICOND";
    case TERNARY: return "TERNARY" ;
    case IFSEQ:   return "IF_SEQ";
  }
  return "**BAD NODE TYPE**";
}

void node_print(Node* n, FILE* out) {
  if (n) {
    if (n->type == ATOM)
      fputs(n->atom, out);
    else {
      fprintf(out, "(%s", typename(n->type));
      for (int i = 0; i < 3 && n->child[i]; ++i) {
        fputc(' ', out); node_print(n->child[i], out);
      }
      fputc(')', out);
    }
  }
}

void yyerror(const char* msg) {
  fprintf(stderr, "%s\n", msg);
}

int main(int argc, char** argv) {
  return yyparse();
}

The lexer is almost trivial. (This one uses lower-case keywords because my fingers prefer that, but it's trivial to change.)

%{
#include "ifelse.tab.h"
%}

%option noinput nounput noyywrap nodefault

%%

and          { return T_AND;  }
else         { return T_ELSE; }
if           { return T_IF;   }
iff          { return T_IFF;  }
not          { return T_NOT;  }
or           { return T_OR;   }
then         { return T_THEN; }

[[:alpha:]]+ { yylval.name = strdup(yytext);
               return T_ID;   }

([[:space:]]{-}[\n])+ ;
\n           { return '\n';   }
.            { return *yytext;}

As written, the parser/lexer reads a line at a time, and prints the AST for each line (so multiline expressions aren't allowed). I hope it's clear how to change it.

Community
  • 1
  • 1
rici
  • 234,347
  • 28
  • 237
  • 341
  • Interesting. Both `tail` productions are replicated exactly in `unmat`; I think `tail` an just generate `unmat` in place of those two. – Kaz Oct 24 '17 at 14:54
  • @kaz: yes. It would also be possible to collect `mat | unmat` into a non-terminal (which is common in the standard presentation). – rici Oct 24 '17 at 15:03
0

A relatively easy way to deal with this requirement is to create a grammar which over-generates, and then reject the syntax we don't want using semantics.

Concretely, we use a grammar like this:

expr : expr AND expr
     | expr OR expr
     | expr IFF expr
     | IF expr THEN expr
     | expr ELSE expr   /* generates some sentences we don't want! */
     | '(' expr ')'
     | ATOM
     ;

Note that ELSE is just an ordinary low precedence operator: any expression can be followed by ELSE and another expression. But in the semantic rule, we implement a check that the left side of ELSE is an IF expression. If not, then we raise an error.

This approach is not only easy to implement, but easy to document for the end-users and consequently easy to understand and use. The end user can accept the simple theory that ELSE is just another binary operator with a very low precedence, along with a rule which rejects it when it's not combined with IF/THEN.

Here is a test run from a complete program I wrote (using classic Yacc, in C):

$ echo 'a AND b OR c' | ./ifelse 
OR(AND(a, b), c)
$ echo 'a OR b AND c' | ./ifelse 
OR(a, AND(b, c))
$ echo 'IF a THEN b' | ./ifelse 
IF(a, b)

Ordinary single IF/ELSE does what we want:

$ echo 'IF a THEN b ELSE c' | ./ifelse 
IFELSE(a, b, c)

The key thing that you're after:

$ echo 'IF a THEN IF x THEN y ELSE c' | ./ifelse
IFELSE(a, IF(x, y), c)

correctly, the ELSE goes with the outer IF. Here is the error case with bad ELSE:

$ echo 'a OR b ELSE c' | ./ifelse 
error: ELSE must pair with IF
<invalid>

Here is parentheses to force the usual "else with closest if" behavior:

$ echo 'IF a THEN (IF x THEN y ELSE c)' | ./ifelse 
IF(a, IFELSE(x, y, c))

The program shows what parse it is using by building an AST and then walking it to print it in prefix F(X, Y) syntax. (For which as a Lisp programmer, I had to hold back the gagging reflex a little bit).

The AST structure is also what allows the ELSE rule to detect whether its left argument is an expression of the correct kind.

Note: You might want the following to be handled, but it isn't:

$ echo 'IF a THEN IF x THEN y ELSE z ELSE w' | ./ifelse 
error: ELSE must pair with IF
<invalid>

The issue here is that the ELSE w is being paired with an IFELSE expression.

A more sophisticated approach is possible that might be interesting to explore. The parser can treat ELSE as an ordinary binary operator and generate the AST that way. Then a whole separate walk can check the tree for valid ELSE usage and transform it as necessary. Or perhaps we can play here with the associativity of ELSE and treat cascading ELSE in the parser action in some suitable way.

The complete source code, which I saved in a file called ifelse.y and built using:

$ yacc ifelse.y
$ gcc -o ifelse y.tab.c

is here:

%{

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>

typedef struct astnode {
  int op;
  struct astnode *left, *right;
  char *lexeme;
} astnode;

void yyerror(const char *s)
{
  fprintf(stderr, "error: %s\n", s);
}

void *xmalloc(size_t size)
{
  void *p = malloc(size);
  if (p)
    return p;

  yyerror("out of memory");
  abort();
}

char *xstrdup(char *in)
{
  size_t sz = strlen(in) + 1;
  char *out = xmalloc(sz);
  return strcpy(out, in);
}

astnode *astnode_cons(int op, astnode *left, astnode *right, char *lexeme)
{
  astnode *a = xmalloc(sizeof *a);
  a->op = op;
  a->left = left;
  a->right = right;
  a->lexeme = lexeme;
  return a;
}

int yylex(void);

astnode *ast;

%}

%union {
  astnode *node;
  char *lexeme;
  int none;
}

%token<none> '(' ')'

%token<lexeme> ATOM

%left<none> ELSE
%left<none> IF THEN
%right<none> IFF
%left<none> OR
%left<none> AND

%type<node> top expr

%%

top : expr { ast = $1; }

expr : expr AND expr
       { $$ = astnode_cons(AND, $1, $3, 0); }
     | expr OR expr
       { $$ = astnode_cons(OR, $1, $3, 0); }
     | expr IFF expr
       { $$ = astnode_cons(IFF, $1, $3, 0); }
     | IF expr THEN expr
       { $$ = astnode_cons(IF, $2, $4, 0); }
     | expr ELSE expr
       { if ($1->op != IF)
         { yyerror("ELSE must pair with IF");
           $$ = 0; }
         else
         { $$ = astnode_cons(ELSE, $1, $3, 0); } }
     | '(' expr ')'
       { $$ = $2; }
     | ATOM
       { $$ = astnode_cons(ATOM, 0, 0, $1); }
     ;

%%

int yylex(void)
{
  int ch;
  char tok[64], *te = tok + sizeof(tok), *tp = tok;

  while ((ch = getchar()) != EOF) {
    if (isalnum((unsigned char) ch)) {
      if (tp >= te - 1)
        yyerror("token overflow");
      *tp++ = ch;
    } else if (isspace(ch)) {
      if (tp > tok)
        break;
    } else if (ch == '(' || ch == ')') {
      if (tp == tok)
        return ch;
      ungetc(ch, stdin);
      break;
    } else {
      yyerror("invalid character");
    }
  }

  if (tp > tok) {
    yylval.none = 0;
    *tp++ = 0;
    if (strcmp(tok, "AND") == 0)
      return AND;
    if (strcmp(tok, "OR") == 0)
      return OR;
    if (strcmp(tok, "IFF") == 0)
      return IFF;
    if (strcmp(tok, "IF") == 0)
      return IF;
    if (strcmp(tok, "THEN") == 0)
      return THEN;
    if (strcmp(tok, "ELSE") == 0)
      return ELSE;
    yylval.lexeme = xstrdup(tok);
    return ATOM;
  }

  return 0;
}

void ast_print(astnode *a)
{
  if (a == 0) {
    fputs("<invalid>", stdout);
    return;
  }

  switch (a->op) {
  case ATOM:
    fputs(a->lexeme, stdout);
    break;
  case AND:
  case OR:
  case IF:
  case IFF:
    switch (a->op) {
    case AND:
      fputs("AND(", stdout);
      break;
    case OR:
      fputs("OR(", stdout);
      break;
    case IF:
      fputs("IF(", stdout);
      break;
    case IFF:
      fputs("IFF(", stdout);
      break;
    }
    ast_print(a->left);
    fputs(", ", stdout);
    ast_print(a->right);
    putc(')', stdout);
    break;
  case ELSE:
    fputs("IFELSE(", stdout);
    ast_print(a->left->left);
    fputs(", ", stdout);
    ast_print(a->left->right);
    fputs(", ", stdout);
    ast_print(a->right);
    putc(')', stdout);
    break;
  }
}

int main(void)
{
   yyparse();
   ast_print(ast);
   puts("");
   return 0;
}
Kaz
  • 55,781
  • 9
  • 100
  • 149
  • That's a plausible approach but this implementation fails on `IF a THEN IF b THEN x ELSE y ELSE z`, which I think is expected to work. Although who knows. – rici Oct 20 '17 at 19:31
  • Oops, you edited to mention that fact. Sorry, didn't see the edit before I commented. – rici Oct 20 '17 at 19:35
  • @rici Can you think of osme way to propagating the ELSE's down the tree or something? The first part `IF a THEN IF b THEN x ELSE y` gets treated as a unit, turned into an ifelse node. Then another ELSE occurs. At that point we can detect that we have an ELSE being applied to an already transformed IFELSE. We can rearrange the ifelse such that we push the ELSE to an inner if (which has to exist, or we throw an error). That then opens up a place for the outer ELSE to "slide in". – Kaz Oct 20 '17 at 20:01
  • It should be possible without the need to check for validity in the semantic action, as long as the entire mess of IFs and ELSEs form a single semantic entity until they are forced to become an `expr`. The semantic unit has a queue of IFs and a stack of ELSEs. When that's converted to an expr, the queue and stack are walked in parallel, matching each ELSE with the outermost IF. – rici Oct 20 '17 at 20:21
  • @rici I'm trying some exploratory Lisp code which takes expressions like `(else (foo ...) bar)` and converts them into `(foo ...)` where `foo` has been transformed by the insertion of `bar` into an embedded `if` form. The procedure is repeated until no more `else` remains. – Kaz Oct 20 '17 at 20:31
  • @rici One little problem in the above implementation is that we can't be ignoring the parentheses in the AST, because then we will add the `ELSE` fragment through parentheses. – Kaz Oct 21 '17 at 15:33