4

I played a bit with the BNF Converter and tried to re-engineer parts of the Mathematica language. My BNF had already about 150 lines and worked OK, until I noticed a very basic bug. Brackets [] in Mathematica are used for two different things

  1. expr[arg] to call a function
  2. list[[spec]] to access elements of an expression, e.g. a List

Let's assume I want to create the parser for a language which consists only of identifiers, function calls, element access and sequence of expressions as arguments. These forms would be valid

f[]
f[a]
f[a,b,c]
f[[a]]
f[[a,b]]

f[a,f[b]]
f[[a,f[x]]]

A direct, but obviously wrong input-file for BNFC could look like

entrypoints Expr ;

TSymbol.        Expr1 ::= Ident ;
FunctionCall.   Expr ::= Expr "[" [Sequence] "]" ;
Part.           Expr ::= Expr "[[" [Sequence] "]]" ;    
coercions Expr 1 ;

separator Sequence "," ;
SequenceExpr. Sequence ::= Expr ;

This BNF does not work for the last two examples of the first code-block.

The problem seems to be located in the created Yylex lexer file, which matches ] and ]] separately. This is wrong, because as can be seen in the last to examples, whether or not it's a closing ] or ]] depends on the context. So either you have to create a stack of braces to ensure the right matching or you leave that to the parser.

Can someone enlighten me whether it's possible to realize this with BNFC?

(Btw, other hints would be gratefully taken too)

halirutan
  • 4,281
  • 18
  • 44
  • 1
    @HighPerformanceMark I don't speak about the front end here, I speak about the kernel, which is obviously able to parse `f[g[x]]` correctly. If I want to use BNFC to create (at least) an incomplete Mathematica parser, I have to find a solution for this; Whether or not it was bad design by Wolfram. – halirutan Jan 10 '13 at 10:46

1 Answers1

3

Your problem is the token "]]". If the lexer collects this without having any memory of its past, it might be mistaken. So just don't do that!

The parser by definition remembers its left context, so you can get it to do the bracket matching correctly.

I would define your grammar this way:

FunctionCall.   Expr ::= Expr "[" [Sequence] "]" ;
Part.           Expr ::= Expr "[" "[" [Sequence] "]" "]" ;   

with the lexer detecting only single "[" "]" as tokens.

An odd variant:

FunctionCall.   Expr ::= Expr "[" [Sequence] "]" ;
Part.           Expr ::= Expr "[[" [Sequence] "]" "]" ; 

with the lexer also detecting "[[" as a token, since it can't be mistaken.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • The really dumb thing is, I had already this solution and obviously didn't reloaded the classes when trying it :-( OK, this works. Thank you. – halirutan Jan 11 '13 at 06:11
  • Alternatively, lex left-hand brackets by the rule that you take two if you see two, else take one (more than two is an error condition). For the right hand brackets, use the state of the parser to tell you how many to put into a lexeme. This way you keep the lexemes of single vs double brackets distinct. I think that is the right thing to do (both : and = are lexemes, but you would not use them separately in order to form :=). – Daniel Lichtblau Jan 14 '13 at 14:52
  • My gorsh, I just realized my prior comment ended in an inadvertant emoticon. I expect this portends The End of Days or something. Another reason one needs to be careful with lexemes. – Daniel Lichtblau Jan 15 '13 at 00:21