How to incorporate ++ operator in yacc

Question

In Chapter 8 of the classic Kernighan & Pike book (The UNIX Programming Environment) hoc1 is presented as an example of simple calculator using yacc.

The book suggest to improve the source code to cover the unitary increment operator (++).

I have tried to implement the ++ operator in the yacc source code (which includes the lex functions and does not use the companion lex program) as follows:

%{
#define YYSTYPE double          /* DATA TYPE OF YACC STACK */ 
%}

%token NUMBER
%left '+' '-'                   /* LEFT ASSOCIATIVE, SAME PRECEDENCE */ 
%left '*' '/'                   /* LEFT ASSOC., HIGHER PRECEDENCE */ 
%left UNARYMINUS                /* LEFT ASSOC., HIGHER PRECEDENCE */ 
%right INC

/* GRAMMAR */

%% 
list:     /* nothing */ 
        | list '\n' 
        | list expr '\n'        { printf("\t%.8g\n", $2); } 

expr:     NUMBER                        { $$ = $1; } 
        | expr INC                      { $$ = $1 + 1; } 
        | '-' expr %prec UNARYMINUS     { $$ = -$2; }
        | expr '+' expr                 { $$ = $1 + $3; } 
        | expr '-' expr                 { $$ = $1 - $3; } 
        | expr '*' expr                 { $$ = $1 * $3; } 
        | expr '/' expr                 { $$ = $1 / $3; } 
        | expr '%' expr                 { $$ = fmod($1, $3); } 
        | '(' expr ')'                  { $$ = $2; }
%% 

/* GRAMMAR PROCESSING ROUTINES */

#include <stdio.h>
#include <ctype.h>
#include <math.h>

int yylex(void);
void yyerror(char *s);
void warning(char *s, char *t);

char    *progname;
int     lineno = 1;

int main(int argc, char *argv[]) {

        progname = argv[0];
        yyparse();
}

int yylex() {
        int c;

        while((c=getchar()) == ' ' || c == '\t')
                ;

        if(c==EOF) {
                return 0;
        }
        if(c=='.' || isdigit(c)) {
                ungetc(c,stdin);
                scanf("%lf", &yylval);
                return NUMBER;
        }
        if(c=='+') {
                if((c=getchar()) == '+') {
                        return INC;
                } else {
                        ungetc(c,stdin);
                        return '+';
                }
        }
        if(c=='\n')
                lineno++;
        return c;
}

void yyerror(char *s) {
        warning(s, (char *)0);
}

void warning(char *s, char *t) {
        fprintf(stderr, "%s: %s", progname, s);
        if(t) {
                fprintf(stderr, " %s", t);
        }
        fprintf(stderr, " near line %d\n", lineno);
}

First I generate the C source code with yacc and I get the shift/reduce conflicts, why those shift/reduce conflicts are present?

$ make
yacc -d hoc1.y
yacc: 11 shift/reduce conflicts.

You can try writing it like this: `expr '+' '+' { $$ = $1 + 1; } ` — maja, Aug 31 '23 at 14:53
When it comes to making Yacc-based parsers, I found it very useful to take it slow, and in *small* steps. Start with e.g. `list: expr '\n'` and `expr: NUMBER` *only*. Then add another `expr` rule, and another, one by one. Perhaps the problem isn't what you think it is? — Some programmer dude, Aug 31 '23 at 14:54
By the way, don't forget to terminate your rules with a `;`. — Some programmer dude, Aug 31 '23 at 14:55
@maja That will allow e.g. `value + +` which is usually very different from `value ++`. — Some programmer dude, Aug 31 '23 at 14:56
I checked with `-Wcounterexamples`. The issue is with `%`. Adding a `%left '%'` should fix it. The `++` is irrelavant. — Weijun Zhou, Aug 31 '23 at 15:05
(`%` usually has the same precedence as `*` and `/`, so you actually want `%left '*' '/' '%'`) — ikegami, Aug 31 '23 at 15:18
@WeijunZhou found the issue, now I understand what was happening. Thanks for all the comments, all of them are useful and appreciated. — M.E., Aug 31 '23 at 16:26

score 0 · Answer 1 · answered Aug 31 '23 at 22:16

The shift/reduce conflicts all come from the ambiguous '%' operator, as you haven't specified a precedence or associativity for it. If you add it to your precedence rules (same precedence as '*' and '/'), they will go away:

%left '*' '/' '%'              /* LEFT ASSOC., HIGHER PRECEDENCE */

score 0 · Answer 2 · answered Sep 01 '23 at 05:59

Thanks to the help from the comments, I am now able to turn my comment into an answer.

If you run the original file through a recent enough version of GNU bison, you will see the following error message,

warning: 11 shift/reduce conflicts [-Wconflicts-sr]
note: rerun with option '-Wcounterexamples' to generate conflict counterexamples

Now if we follow the suggestion and add -Wcounterexamples, we can see the details of the shift-reduce conflicts:

warning: shift/reduce conflict on token '%' [-Wcounterexamples]
  Example: '-' expr . '%' expr
  Shift derivation
    expr
    `-> 6: '-' expr
               `-> 11: expr . '%' expr
  Reduce derivation
    expr
    `-> 11: expr              '%' expr
            `-> 6: '-' expr .
warning: shift/reduce conflict on token '%' [-Wcounterexamples]
  Example: expr '+' expr . '%' expr
  Shift derivation
    expr
    `-> 7: expr '+' expr
                    `-> 11: expr . '%' expr
  Reduce derivation
    expr
    `-> 11: expr                   '%' expr
            `-> 7: expr '+' expr .
test.y: warning: shift/reduce conflict on token '%' [-Wcounterexamples]
  Example: expr '-' expr . '%' expr
  Shift derivation
    expr
    `-> 8: expr '-' expr
                    `-> 11: expr . '%' expr
  Reduce derivation
    expr
    `-> 11: expr                   '%' expr
            `-> 8: expr '-' expr .
warning: shift/reduce conflict on token '%' [-Wcounterexamples]
  Example: expr '*' expr . '%' expr
  Shift derivation
    expr
    `-> 9: expr '*' expr
                    `-> 11: expr . '%' expr
  Reduce derivation
    expr
    `-> 11: expr                   '%' expr
            `-> 9: expr '*' expr .
warning: shift/reduce conflict on token '%' [-Wcounterexamples]
  Example: expr '/' expr . '%' expr
  Shift derivation
    expr
    `-> 10: expr '/' expr
                     `-> 11: expr . '%' expr
  Reduce derivation
    expr
    `-> 11: expr                    '%' expr
            `-> 10: expr '/' expr .
warning: shift/reduce conflict on token INC [-Wcounterexamples]
  Example: expr '%' expr . INC
  Shift derivation
    expr
    `-> 11: expr '%' expr
                     `-> 5: expr . INC
  Reduce derivation
    expr
    `-> 5: expr                    INC
           `-> 11: expr '%' expr .
warning: shift/reduce conflict on token '+' [-Wcounterexamples]
  Example: expr '%' expr . '+' expr
  Shift derivation
    expr
    `-> 11: expr '%' expr
                     `-> 7: expr . '+' expr
  Reduce derivation
    expr
    `-> 7: expr                    '+' expr
           `-> 11: expr '%' expr .
warning: shift/reduce conflict on token '-' [-Wcounterexamples]
  Example: expr '%' expr . '-' expr
  Shift derivation
    expr
    `-> 11: expr '%' expr
                     `-> 8: expr . '-' expr
  Reduce derivation
    expr
    `-> 8: expr                    '-' expr
           `-> 11: expr '%' expr .
warning: shift/reduce conflict on token '*' [-Wcounterexamples]
  Example: expr '%' expr . '*' expr
  Shift derivation
    expr
    `-> 11: expr '%' expr
                     `-> 9: expr . '*' expr
  Reduce derivation
    expr
    `-> 9: expr                    '*' expr
           `-> 11: expr '%' expr .
warning: shift/reduce conflict on token '/' [-Wcounterexamples]
  Example: expr '%' expr . '/' expr
  Shift derivation
    expr
    `-> 11: expr '%' expr
                     `-> 10: expr . '/' expr
  Reduce derivation
    expr
    `-> 10: expr                    '/' expr
            `-> 11: expr '%' expr .
warning: shift/reduce conflict on token '%' [-Wcounterexamples]
  Example: expr '%' expr . '%' expr
  Shift derivation
    expr
    `-> 11: expr '%' expr
                     `-> 11: expr . '%' expr
  Reduce derivation
    expr
    `-> 11: expr                    '%' expr
            `-> 11: expr '%' expr .

Here we can see what exactly the 11 shift-reduce conflicts are and how the same expression can be parsed in two different ways. Notably, the last example is expr '%' expr . '%' expr, which only involves the % operator, and it is a strong hint that the % operator itself is to be blame for the conflicts. We can then check the rest 10 cases and see that their counterexamples all involve the % operator.

According to the GNU bison manual, shift/reduce conflicts can be resolved in two ways:

When possible, you should rather use precedence directives to fix the conflicts explicitly ...

The second option basically means rewriting the grammar itself such that it is no longer ambiguous. In this case, the first option is enough to solve the issue. For more details, see How to solve a shift/reduce conflict?

Specifying the associativity of % can solve the issue, but we also need to guarantee the correct precedence (Credits to @Someprogrammerdude in the comments). Assuming the same operator precedence as in the C standard, the final solution is to replace the line

%left '*' '/'

with

%left '*' '/' '%'

Now for the question in the title. The correct way to implement an increment operator is the way in OP's code. Semantically increment should be a single token, so it is better to treat it as such at the lexer level, instead of generating two PLUS tokens and then combining them in a fragile way in the parser level. The important thing is that it's much easier to fix issues at the lexer level, if that is at all possible, than fixing at the parser level.

I have a personal experience writing a bash-like parser using yacc. I originally made & a token for background execution(BACK), and > a token for redirection(REDIR). Then tried to combine BACK and REDIR at the parser level to &> which means both redirecting the stdout and stderr. That generated a lot of shift-reduce conflicts and even reduce-reduce conflicts (reduce-reduce indicates serious errors according to the GNU bison manual). I can of course spend a lot of time to rewrite the grammar so that it is not ambiguous, but if I just fix it at the lexer level (lex &> as REDIRBOTH instead of BACK + REDIR), the issue is just gone without further fix needed.

How to incorporate ++ operator in yacc

2 Answers2