0

Following up on Converting W3C's EBNF to BNF

With a grammar defined as enclosed below, how can it know when a grammar rule is ended?

E.g., line 3 & 4,

Production*
Production

How can the grammar tell that the second Production is not part of Item, which can be repeated as Item Item ..., according to the rule of "(Item ( '-' Item | Item* ))?"

Most languages solve such problems with an ending symbol like ";". Is the following grammar defined precisely enough so that such an ending symbol is not necessary?

Grammar
 ::= 
Production*
Production
 ::= 

NCName '::=' Choice
NCName
 ::= 
[http://www.w3.org/TR/xml-names/#NT-NCName]

Choice
 ::= 
SequenceOrDifference ( '|' SequenceOrDifference )*

SequenceOrDifference
 ::= 
(Item ( '-' Item | Item* ))?

Item
 ::= 
Primary ( '?' | '*' | '+' )*

Primary
 ::= 
NCName | StringLiteral | CharCode | CharClass | '(' Choice ')'
StringLiteral
 ::= 
'"' [^"]* '"' | "'" [^']* "'"
/* ws: explicit */
CharCode
 ::= 
'#x' [0-9a-fA-F]+
/* ws: explicit */
CharClass
 ::= 
'[' '^'? ( Char | CharCode | CharRange | CharCodeRange )+ ']'
/* ws: explicit */
Char
 ::= 
[http://www.w3.org/TR/xml#NT-Char]
CharRange
 ::= 
Char '-' ( Char - ']' )
/* ws: explicit */
CharCodeRange
 ::= 
CharCode '-' CharCode
/* ws: explicit */
Link
 ::= 
'[' URL ']'
URL
 ::= 
[^#x5D:/?#]+ '://' [^#x5D#]+ ('#' NCName)?
/* ws: explicit */
Whitespace
 ::= 
S | Comment
S
 ::= 
#x9 | #xA | #xD | #x20
Comment
 ::= 
'/*' ( [^*] | '*'+ [^*/] )* '*'* '*/'
/* ws: explicit */
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
xpt
  • 20,363
  • 37
  • 127
  • 216
  • W3C EBNF doesn't end a production with a semicolon because a `::=` operator comes after the LHS symbol of a new production. In the case of the last production in the grammar the EOF condition indicates that the rule has ended. – kaby76 Jan 13 '22 at 09:53
  • Ah, got you! I was trying to parse it with LR(0), and what you're saying is that I need at least LR(1) to parse such grammar, right @kaby76? – xpt Jan 13 '22 at 14:51
  • 1
    For sure, not LR(0). I can't say for sure if this would be LR(1) unless I work out the automaton, or use an online parser generator to check (e.g. [Grammophone](S -> a S b . S -> .)). Your link to W3C EBNF looks good, so you just need to convert to Grammophone syntax and try. – kaby76 Jan 13 '22 at 15:30
  • 1
    Sorry, the link is https://mdaines.github.io/grammophone/ . Others you might try: https://www.cs.princeton.edu/courses/archive/spring20/cos320/LL1/ https://pegjs.org/online http://www.fit.vutbr.cz/~ikocman/llkptg/ – kaby76 Jan 13 '22 at 16:03

0 Answers0