0

This is probably a simple question and has already been asked but I'm having difficulty understanding Bison and in particular operator precedence. If I have this code and + has left association.

%left '+'
%%
S:
   S E ’\n’ { printf("%d\n", $2); }
|
; 

E:
   num { $$ = $1; }
| E '+' E {$$ = $1 + $3;}
| E '*'  E {$$ = $1 * $3;}
; 
%%

The input is 2+3+4*5 and the output is 25. My answer was 45.

Could someone show me step by step what Bison does? I mean how elements are pushed to the stack and how and when they are reduced. Or even the parse tree if is possible.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Mattia
  • 130
  • 2
  • 11
  • 1
    As written, that grammar won't compile. You need quotes around the `+` in the first line; to close the quotes around the `*` near the end; and you need to add a precedence rule for `*` to avoid shift-reduce conflicts. – rici Jun 19 '18 at 22:25
  • I'm not trying to compile but understand the process – Mattia Jun 19 '18 at 22:28
  • 2
    If your example doesn't compile, it means there are errors which make it impossible to explain "how it works". It doesn't work, and your intentions are therefore not clear enough. For example, the absence of a precedence declaration for `*` means that your parser has a shift-reduce conflict which was resolved through a different mechanism; it's not possible to explain "what's happening" without going into detail about this different mechanism, which is probably irrelevant to your understanding of this mechanism. – rici Jun 19 '18 at 22:31
  • Well, the process will be different for slightly different but correct programs. So it wouldn't do much good for me to guess from this incorrect program what your program that produced output 25 was and explain that. – aschepler Jun 19 '18 at 22:32
  • Did you try this explanation: https://stackoverflow.com/a/26785562/1566221 – rici Jun 19 '18 at 22:35
  • @rici you don't need to add precedence declaration for all operator. yacc use default if it not declared so it shift. This was from a test, i'm pretty sure it was like this. I don't understand why it does (2+3)+(4*5). Why when it reach the second + it doesn't reduce but shift? Yes I tried that explanation. – Mattia Jun 19 '18 at 22:38
  • @Mattia: from what you say, it appears that Bison has assigned `*` a higher precedence than `+`, which means it yields 25 rather than 45, which is presumably what you're hoping for. Because you aren't explicit, you get what you get. Don't use implicit precedences — it confuses you (and those who help you maintain your grammar). – Jonathan Leffler Jun 19 '18 at 22:41
  • 1
    When it reaches the second +, it does reduce. If it didn't reduce, you wouldn't have gotten (2+3). (2+3) is the result of a reduction. – rici Jun 19 '18 at 22:41
  • @JonathanLeffler this was an exercise on an exam. It was written like this. Do you know why bison has assigned higher precedence to the `*` even if `+` has left association? At the point 5+4*5 and the lookahead reads a `*` why it decide to do shifting instead reduction? – Mattia Jun 19 '18 at 22:49
  • 1
    as you yourself said, bison/yacc's default resolution is to prefer to shift. If there is no applicable precedence declaration, the default resolution applies. – rici Jun 19 '18 at 22:51
  • 1
    @Mattia: it may be that my 'explanation' is wrong; it isn't so much higher precedence as … default handling. I wouldn't attempt to use a grammar like the one shown. If I was presented with it for code review, it would go back for fixing before further review. You're unlucky that your examiner thinks such tricky coding is good for exams. It's abominable to need to know the darkest corners of the language when the code can be written clearly. – Jonathan Leffler Jun 19 '18 at 22:54
  • 1
    @jonathan: Yacc/Bison never assigns a default precedence. It's default action is to shift, period. If no precedence declaration is applicable, bison will report shift-reduce conflicts and default to shifting to resolve them. See example in my answer and note the bison warning. – rici Jun 19 '18 at 23:02

1 Answers1

3

The easiest way to see what is going on in a grammar is to enable bison's trace facility, explained in the bison manual section on debugging parsers. It's useful to have the state machine handy while you're reading traces, since the trace provides the path through the state machine. To see the state machine, use bison's -v option, which will create a file with extension .output.

$ cat minimal.y
%{
#include <stdio.h>
#include <ctype.h>
int yylex(void);
void yyerror(const char* msg) {
  fprintf(stderr, "%s\n", msg);
}
%}
%token num
%left '+'
%%
S: S E '\n' { printf("%d\n", $2); }
 |
E: num { $$ = $1; }
 | E '+' E {$$ = $1 + $3;}
 | E '*'  E {$$ = $1 * $3;}
%%
int yylex(void) {
  int c;
  do c = getchar(); while (c == ' ');
  if (isdigit(c)) {
    yylval = c - '0';
    return num;
  }
  return c == EOF ? 0 : c;
}
int main(int argc, char* argv[]) {
#if YYDEBUG
  yydebug = 1;
#endif
  return yyparse();
}

Compile and run:

$ bison -t -v -o minimal.c minimal.y
minimal.y: warning: 3 shift/reduce conflicts [-Wconflicts-sr]
$ gcc -Wall -o minimal minimal.c
$ ./minimal <<<'2+3+4*5'
Starting parse
Entering state 0
Reducing stack by rule 2 (line 14):
-> $$ = nterm S ()
Stack now 0

I snipped the trace (although you can see it at the bottom of the answer). Look through the trace for the line which says that it is reading the token *:

Entering state 8
Reading a token: Next token is token '*' ()
Shifting token '*' ()
Entering state 7

Here's the definition of State 8 from minimal.output, complete with shift-reduce conflict (indicated by the square brackets around the action which will not be taken) and the default resolution:

State 8

    4 E: E . '+' E
    4  | E '+' E .
    5  | E . '*' E

    '*'  shift, and go to state 7

    '*'       [reduce using rule 4 (E)]
    $default  reduce using rule 4 (E)

Here's the complete trace (although I strongly encourage you to do the experiment on your own machine):

Starting parse
Entering state 0
Reducing stack by rule 2 (line 14):
-> $$ = nterm S ()
Stack now 0
Entering state 1
Reading a token: Next token is token num ()
Shifting token num ()
Entering state 3
Reducing stack by rule 3 (line 16):
   $1 = token num ()
-> $$ = nterm E ()
Stack now 0 1
Entering state 4
Reading a token: Next token is token '+' ()
Shifting token '+' ()
Entering state 5
Reading a token: Next token is token num ()
Shifting token num ()
Entering state 3
Reducing stack by rule 3 (line 16):
   $1 = token num ()
-> $$ = nterm E ()
Stack now 0 1 4 5
Entering state 8
Reading a token: Next token is token '+' ()
Reducing stack by rule 4 (line 17):
   $1 = nterm E ()
   $2 = token '+' ()
   $3 = nterm E ()
-> $$ = nterm E ()
Stack now 0 1
Entering state 4
Next token is token '+' ()
Shifting token '+' ()
Entering state 5
Reading a token: Next token is token num ()
Shifting token num ()
Entering state 3
Reducing stack by rule 3 (line 16):
   $1 = token num ()
-> $$ = nterm E ()
Stack now 0 1 4 5
Entering state 8
Reading a token: Next token is token '*' ()
Shifting token '*' ()
Entering state 7
Reading a token: Next token is token num ()
Shifting token num ()
Entering state 3
Reducing stack by rule 3 (line 16):
   $1 = token num ()
-> $$ = nterm E ()
Stack now 0 1 4 5 8 7
Entering state 9
Reading a token: Next token is token '\n' ()
Reducing stack by rule 5 (line 18):
   $1 = nterm E ()
   $2 = token '*' ()
   $3 = nterm E ()
-> $$ = nterm E ()
Stack now 0 1 4 5
Entering state 8
Next token is token '\n' ()
Reducing stack by rule 4 (line 17):
   $1 = nterm E ()
   $2 = token '+' ()
   $3 = nterm E ()
-> $$ = nterm E ()
Stack now 0 1
Entering state 4
Next token is token '\n' ()
Shifting token '\n' ()
Entering state 6
Reducing stack by rule 1 (line 13):
   $1 = nterm S ()
   $2 = nterm E ()
   $3 = token '\n' ()
25
-> $$ = nterm S ()
Stack now 0
Entering state 1
Reading a token: Now at end of input.
Shifting token $end ()
Entering state 2
Stack now 0 1 2
Cleanup: popping token $end ()
Cleanup: popping nterm S ()
rici
  • 234,347
  • 28
  • 237
  • 341
  • Thanks! The steps will help. How do you print also the state ? How do I interpreter the dot? It means that it has read until that point so it shouldn't be only E . `*` E because E `+` E was reduce to E? Tomorrow I will test and see how it goes – Mattia Jun 19 '18 at 23:13
  • 1
    @Mattia: As I said, you use the `-v` option to generate the `.output` file, and then you look through it with a text editor. The dot is explained in any textbook which describes LR parsing. Look for "LR items". – rici Jun 19 '18 at 23:23
  • Is the grammar S-> S E, E-> num, E -> E + E, E -> E * E ? If I construct the LALR(1) has it the same states of the one created by yacc? I've always seen S -> E not S -> S E. In the end there is E followed by $ EOF and I don't understand how it match the rule S -> S E to reduce it the start symbol – Mattia Jun 20 '18 at 10:44
  • @mattia: `S` has two productions: `S -> S E` and `S ->`. It is the second which permits the recursion to terminate. Yacc/bison "augments" the grammar by adding a start symbol `$start -> S $end`; this is normal in LR parsing algorithms, for reasons which may be explained in your textbook. (Basically it makes it possible to recognise the end of the input.) – rici Jun 20 '18 at 14:09
  • I know about the the augmented grammar. Changing the first rule to `S: E '\n' { printf("%d\n", $1); }` seems to work also. Now I'm trying to understand why is printing 5 if use `$2` instead of `$1` like this `S: E '\n' { printf("%d\n", $2); }`. I'm reading [this](http://www.gnu.org/software/bison/manual/html_node/index.html#SEC_Contents) – Mattia Jun 20 '18 at 14:13
  • @mattia: `S -> S E | %empty` lets it parse multiple `E`s. `S -> E` only parses one. Referring to $2 for a lexical token for which the lexical scanner doesn't modify `yylval` is a programming error with undefined consequences, but as a hint: if the lexer doesn't change the value of `yylval`, it's still the same as it was before. – rici Jun 20 '18 at 14:23
  • Sorry last question, I made some test but I still have to understand this: `S -> S E | %empty lets it parse multiple Es`. If I use the production `S-> E`, the only difference I can see in the parse tree is that the last `E` gets reduced to a `S` instead having `S E` reduced to an `S`. [Picture](https://i.imgur.com/gbnCzCR.jpg) of the 2 parse tree – Mattia Jun 21 '18 at 10:33
  • @mattia: perhaps I misunderstood your question. If you were asking about the difference between `S -> S E | %empty` and `S -> S E | E`, the answer is that the first one accepts an empty input while the second one insists that there be at least one expression. I thought you were asking about the difference between `S -> S E | %empty` and `S -> E`. – rici Jun 21 '18 at 13:27
  • Although that difference may seem minor, many people prefer `S -> S E | %empty` because it reduces code duplication: handling of the `E` reduction (such as printing out the value) only needs to be done in one action, instead of duplicating it in both places. – rici Jun 21 '18 at 13:31
  • Yes, I was asking about the difference between `S -> S E | %empty and S -> E`. So it's only because it reduces code duplication. Thanks – Mattia Jun 21 '18 at 13:53
  • @mattia: Not throwing an error on empty input is sometimes nice, too :-) And I still don't know which pair of grammars you are asking about. – rici Jun 21 '18 at 14:19
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/173569/discussion-between-mattia-and-rici). – Mattia Jun 21 '18 at 14:54