My colleague PaulS asked me the following:
I'm writing a parser for an existing language (SystemVerilog - an IEEE standard), and the specification has a rule in it that is similar in structure to this:
cover_point
=
[[data_type] identifier ':' ] 'coverpoint' identifier ';'
;
data_type
=
'int' | 'float' | identifier
;
identifier
=
?/\w+/?
;
The problem is that when parsing the following legal string:
anIdentifier: coverpoint another_identifier;
anIdentifier
matches with data_type
(via its identifier option) successfully, which means Grako is looking for another identifier after it and then fails. It doesn't then try to parse without the data_type part.
I can re-write the rule as follows,
cover_point_rewrite
=
[data_type identifier ':' | identifier ':' ] 'coverpoint' identifier ';'
;
but I wonder if:
- this is intentional and
- if there's a better syntax?
Is this a PEG-in-general issue, or a tool (Grako) one?