0

I'm trying to parse identifiers in the Verilog language. The full grammar is here.

They can have the forms below:

name
name[index]
name[start:stop]
name[index][start:stop]
name.(any of the above)
name[index].(any of the above)
name[index].name[index]... .name[index][start:stop]

Or in EMBF format:

(name ([index])?)+ ([start:stop])?

Here name is a typical identifier as in most programming languages, while index, start and stop are integer.

I'm new to yacc (I'm actually using Jison) but I'm not sure that this can be correctly interpreted at all with the single lookahead token limitation. If name and [ are in the stack, it cannot tell the difference between index and start.

This is my grammar so far:

primary 
: number
| hierarchical_identifier bracketted_range_expression
| hierarchical_identifier
;

primary 
: number
| hierarchical_identifier
| hierarchical_identifier bracketted_range_expression
;

hierarchical_identifier
: IDENTIFIER
| IDENTIFIER '[' UNSIGNED_NUMBER ']'
| hierarchical_identifier '.' IDENTIFIER
| hierarchical_identifier '.' IDENTIFIER '[' UNSIGNED_NUMBER ']'
;

bracketted_range_expression
: '[' range_expression ']';

range_expression
: UNSIGNED_NUMBER ':' UNSIGNED_NUMBER

Which yields several shift/reduce and reduce/reduce errors, and it simply does not want to parse something line foo[1:0]. It expects a ] instead fo the :. Any way to solve this?

jpcgt
  • 2,138
  • 2
  • 19
  • 34

1 Answers1

1

Your analysis is correct. With your grammar, hierarchical_identifier must be reduced before the parser can start scanning bracketed_range_expression, but it cannot know whether the [ following an IDENTIFIER starts a bracketed_range_expression (in which case the reduction is necessary) or is the [ in '[' UNSIGNED_NUMBER ']' (in which case it should be shifted).

With bison, we could solve this using a GLR parser, but with jison we're limited to LALR(1). Fortunately, the language is still LALR(1); all that is needed is to defer the shift/reduce decision by expanding some nonterminals and reducing them later.

Unfortunately, the result is a bit ugly because it requires a certain amount of duplication. Because we need to always be able to shift the [, we end up "denormalizing" the grammar (to borrow a term from database design).

Here's one solution, tested with the jison on-line tool. (The actions were only intended to show that the grammar attaches the range to the entire hierarchical list, and not just to the last identifier in the list.)

/* lexical grammar */
%lex
%%

\s+                   /* skip whitespace */
[0-9]+                return 'NUMBER'
[a-zA-Z][a-zA-Z0-9]*  return 'IDENTIFIER'
.                     return yytext[0]
<<EOF>>               return 'EOF'

/lex

%start expr

%% /* language grammar */

expr   : primary EOF { return $1; }
       ;

primary: NUMBER
       | hierarchical_identifier
       | hierarchical_identifier_with_range
       ;

indexed_identifier
       : IDENTIFIER '[' NUMBER ']' { $$ = { "id": $1, "idx": $3}; } ;

postfix_range
       : '[' NUMBER ':' NUMBER ']' { $$ = [ $2, $4 ]; } ;

hierarchical_identifier
       : IDENTIFIER         { $$ = []; $$.push({ "id": $1}); }
       | indexed_identifier { $$ = [ $1 ]; }
       | hierarchical_identifier '.' IDENTIFIER
                            { $$ = $1; $$.push({ "id": $3}); }
       | hierarchical_identifier '.' indexed_identifier
                            { $$ = $1; $$.push($3); }
       ;

hierarchical_identifier_with_range
       : IDENTIFIER postfix_range
                            { $$ = { "idlist": [{"id": $1}],
                                     "lo": $2[0], "hi": $2[1]}; }
       | indexed_identifier postfix_range
                            { $$ = { "idlist": [$1],
                                     "lo": $2[0], "hi": $2[1]}; }
       | hierarchical_identifier '.' IDENTIFIER postfix_range
                            { $1.push({"id": $3});
                              $$ = { "idlist": $1,
                                     "lo": $4[0], "hi": $4[1]}; }
       | hierarchical_identifier '.' indexed_identifier postfix_range
                            { $1.push($3);
                              $$ = { "idlist": $1,
                                     "lo": $4[0], "hi": $4[1]}; }
       ;

If you eventually plan to use bison, you will probably find that a GLR parser is easier, without adding too much parsing overhead (since your grammar really is unambiguous).

rici
  • 234,347
  • 28
  • 237
  • 341
  • +1 This works. Could you please explain the logic behind your changes? I might need to fix multiple problems like this and I'm not fully grasping the technique for solving the generic problem. – jpcgt Aug 02 '15 at 15:01
  • @jpcgt: The basic principle is the one I outlined: if a single token lookahead is not sufficient, work out how to defer the decision. If there is a known finite lookahead, this can be done mechanically, but you can often do better with a bit of ingenuity. Here, the grammar is LR(3), so the mechanical solution would work. All I did was to extend the productions which need more lookahead until the conflict can be resolved. – rici Aug 02 '15 at 15:53
  • @jpcgt: But if at all possible, use a GLR parser and avoid the grammar manipulation. – rici Aug 02 '15 at 15:56
  • I'm really new at this so I'm having trouble understanding how the decision is being deferred. What do you mean by extending the productions? Breaking them into smaller pieces? I know, this might require a lengthy explanation, so perhaps you could point me to a good article? I'm stick with Jison at this time, but eventually might move to Bison for production. – jpcgt Aug 02 '15 at 16:23
  • @jpcgt: Compare what I wrote with what you wrote. You had the right-hand side: `hierarchical_identifier bracketted_range_expression`; that doesn't work because it is not possible to know whether to reduce `hierarchical_identifier` at the point when it might need to be reduced. So I introduce a new non-terminal, `hierarchical_identifier_with_range`, and expand out enough of the production so that shift/reduce decisions can be made. I don't know how to explain it better; after you've done it a few times, it becomes easier. I can point you to other examples, though. – rici Aug 02 '15 at 16:30
  • @jpcgt: http://stackoverflow.com/a/19340247/1566221 http://stackoverflow.com/a/16931641/1566221 http://stackoverflow.com/a/26189559/1566221 – rici Aug 02 '15 at 16:35