3

I'm trying to generate the parser table using the lemon parser generator, but the .out file generated when I run lemon grammar.y only contains the states of the automaton.

Is there a way to also get the goto table for non-terminals, not only the states of the automaton? Or this can only be done by reading the generated code? Are there any other tools that can generate both the action and the goto tables?

PS:

The .out file (generated by lemon) for a simple grammar looks like this:

State 0:
          start ::= * e
          e ::= * e PLUS t
          e ::= * t
          t ::= * t MUL f
          t ::= * f
          f ::= * LPAR e RPAR
          f ::= * ID

                          LPAR shift        1      
                            ID shift        4      
                         start accept
                             e shift        11     
                             t shift        6      
                             f shift        5      

State 1:
          e ::= * e PLUS t
          e ::= * t
          t ::= * t MUL f
          t ::= * f
          f ::= * LPAR e RPAR
          f ::= LPAR * e RPAR
          f ::= * ID

                          LPAR shift        1      
                            ID shift        4      
                             e shift        10     
                             t shift        6      
                             f shift        5      

State 2:
          e ::= e PLUS * t
          t ::= * t MUL f
          t ::= * f
          f ::= * LPAR e RPAR
          f ::= * ID

                          LPAR shift        1      
                            ID shift        4      
                             t shift        9      
                             f shift        5      

State 3:
          t ::= t MUL * f
          f ::= * LPAR e RPAR
          f ::= * ID

                          LPAR shift        1      
                            ID shift        4      
                             f shift        8      

State 4:
      (6) f ::= ID *

                             $ reduce       6      f ::= ID
                          PLUS reduce       6      f ::= ID
                           MUL reduce       6      f ::= ID
                          RPAR reduce       6      f ::= ID

State 5:
      (4) t ::= f *

                             $ reduce       4      t ::= f
                          PLUS reduce       4      t ::= f
                           MUL reduce       4      t ::= f
                          RPAR reduce       4      t ::= f

State 6:
      (2) e ::= t *
          t ::= t * MUL f

                             $ reduce       2      e ::= t
                          PLUS reduce       2      e ::= t
                           MUL shift        3      
                          RPAR reduce       2      e ::= t

State 7:
      (5) f ::= LPAR e RPAR *

                             $ reduce       5      f ::= LPAR e RPAR
                          PLUS reduce       5      f ::= LPAR e RPAR
                           MUL reduce       5      f ::= LPAR e RPAR
                          RPAR reduce       5      f ::= LPAR e RPAR

State 8:
      (3) t ::= t MUL f *

                             $ reduce       3      t ::= t MUL f
                          PLUS reduce       3      t ::= t MUL f
                           MUL reduce       3      t ::= t MUL f
                          RPAR reduce       3      t ::= t MUL f

State 9:
      (1) e ::= e PLUS t *
          t ::= t * MUL f

                             $ reduce       1      e ::= e PLUS t
                          PLUS reduce       1      e ::= e PLUS t
                           MUL shift        3      
                          RPAR reduce       1      e ::= e PLUS t

State 10:
          e ::= e * PLUS t
          f ::= LPAR e * RPAR

                          PLUS shift        2      
                          RPAR shift        7      

State 11:
      (0) start ::= e *
          e ::= e * PLUS t

                             $ reduce       0      start ::= e
                          PLUS shift        2      

----------------------------------------------------
Symbols:
    0: $:
    1: PLUS
    2: MUL
    3: LPAR
    4: RPAR
    5: ID
    6: error:
    7: start: LPAR ID
    8: e: LPAR ID
    9: t: LPAR ID
   10: f: LPAR ID
Paul
  • 20,883
  • 7
  • 57
  • 74
  • That looks like an action table to me. What were you expecting which is not in the output? – rici Nov 04 '15 at 06:53
  • The *goto table* or maybe known as the *jump table*. – Paul Nov 04 '15 at 09:32
  • And the lines of the form ` shift ` are what? – rici Nov 04 '15 at 11:54
  • Those are actions, for terminal symbols. I need the jump table for when I have a non-terminal on the stack. – Paul Nov 04 '15 at 14:44
  • If you know about this file that lemon generates, could you explain it, please? – Paul Nov 04 '15 at 14:44
  • `e shift 11` (in state 0) is the goto action for `e` in state 0. There is really no difference between a shift action and a goto action; in both cases, a state and a symbol (terminal in the first case, non-terminal in the second case) are pushed onto the stack and a transition is taken to the new state. If it makes it easier for you to understand, mentally change the word `shift` to `goto` if it follows a non-terminal. – rici Nov 04 '15 at 15:17
  • Could you please write the steps you take when parsing the expression `ID` using the above file? I cannot seem to get past the `$ reduce 6 f ::= ID` in `State 4`, ie. the second step of the parse. – Paul Nov 04 '15 at 15:45
  • The LR parsing algorithm is Algorithm 4.7 in the Dragon book. (Basically, to do a reduce action, you pop the rhs symbols and associated states from the stack, and then shift the lhs symbol along with the indicated state from the goto table; the shift is *just like* the shift of a terminal.) – rici Nov 04 '15 at 15:58
  • So let's say that I do `$ reduce 6 f ::= ID`, then on the top of the stack I will have `... f 6`, so if I go in `State 6`, there is no transition for my `f` symbol on the stack, and I'm stuck. PS: Thanks for your patience. – Paul Nov 04 '15 at 16:03
  • reduce 6 doesn't have anything to do with state 6. It means "reduce using production 6". So the stack was `... q X 4 ID` where q is a state number and X is a symbol. The rhs of production 6 has one symbol (ID), so you pop it and its associated state from the stack, leaving you with `... q X`. You then consult state q to find out what to do with `f`, which will be an action of the form `goto q'` (written as `shift q'` by lemon), so you then have `... q X q' f` and you're in state `q'` with the same lookahead symbol. – rici Nov 04 '15 at 19:22
  • @rici Your comment finally made it click for me, thanks! Can you please add your answer so I can accept it? Otherwise I'll reply my own question so that other people will understand what's going on. – Paul Nov 06 '15 at 08:23
  • Hope the answer sums up the comments. – rici Nov 06 '15 at 14:44

1 Answers1

4

Lemon outputs the action table and the goto table in a single block. The goto function looks like shift actions, except that the lookahead is a non-terminal rather than a terminal.

So if we take State 0:

 LPAR shift        1      
   ID shift        4      
start accept
    e shift        11     
    t shift        6      
    f shift        5      

The first two lines are the actions on reading LPAR and ID, respectively. The remaining lines are the goto function, which is used when a reduce action reveals this state by popping the stack. (Unlike a traditional LR machine, in Lemon the accept action is in the goto table rather than in the action table for the end-of-input pseudo-terminal.)

Although most descriptions of the LR parser distinguish between the action table and the goto table, there is very little difference between a "shift" action and the "goto" part of a reduce action. Both of these push the current state number and a symbol onto the parser stack. The difference is that a reduce action (such as reduce 6, which means "reduce using production 6" -- it has nothing to do with state 6) first pops the right-hand side of the indicated production off of the stack and sets the current state to the newly-revealed state on the top of the stack before executing the shift/goto. (Another difference is that after a shift action, it is necessary to read a new lookahead token, whereas the reduce action does not consume the input.)

rici
  • 234,347
  • 28
  • 237
  • 341
  • Still I'd add that after a reduce we use the left side of a production (non-terminal) and the state left on the stack after popping the right hand side, to determine the new action. – Paul Nov 06 '15 at 19:50