21

I am trying to build a Lisp grammar. Easy, right? Apparently not.

I present these inputs and receive errors...

( 1 1)
23 23 23 
ui ui

This is the grammar...

%%
sexpr: atom                 {printf("matched sexpr\n");}
    | list
    ;
list: '(' members ')'       {printf("matched list\n");}
    | '('')'                {printf("matched empty list\n");}
    ;
members: sexpr              {printf("members 1\n");}
    | sexpr members         {printf("members 2\n");}
    ;
atom: ID                    {printf("ID\n");}
    | NUM                   {printf("NUM\n");}
    | STR                   {printf("STR\n");}
    ;
%%

As near as I can tell, I need a single non-terminal defined as a program, upon which the whole parse tree can hang. But I tried it and it didn't seem to work.

edit - this was my "top terminal" approach:

program: slist;

slist: slist sexpr | sexpr;

But it allows problems such as:

( 1 1 

Edit2: The FLEX code is...

%{
    #include <stdio.h>
    #include "a.yacc.tab.h"
    int linenumber;
    extern int yylval;
%}
%%
\n                         { linenumber++; }
[0-9]+                     { yylval = atoi(yytext); return NUM; }
\"[^\"\n]*\"               { return STR; }
[a-zA-Z][a-zA-Z0-9]*       { return ID; }
.
%%

An example of the over-matching...

(1 1 1)
NUM
matched sexpr
NUM
matched sexpr
NUM
matched sexpr
(1 1
NUM
matched sexpr
NUM
matched sexpr

What's the error here?

edit: The error was in the lexer.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Paul Nathan
  • 39,638
  • 28
  • 112
  • 212
  • What does your output look like when it parses the ( 1 1 ? I can't see how it gets to a point of not expecting the closing ). – Beardo Feb 05 '09 at 18:43
  • 1
    Could you please post your lex/flex file too? Maybe there's an error. Also, you should not use characters '(' in the grammar if you use a lexer, I'm not really sure how these get along. – jpalecek Feb 05 '09 at 18:57
  • What's strange is that even the valid list (1 1 1) doesn't come up as a match for a list. I would try two things, first make members left recursive: members: members sexpr | sexpr; Second, swap the order of list in sexpr: list | atom; See if that works. – Beardo Feb 06 '09 at 15:40
  • Beardo, jpalecek had the answer - for this bug. :) – Paul Nathan Feb 07 '09 at 15:37
  • One of your comments down the page suggests that you will be using this on an embedded device. If so, please mention this in your question. – SingleNegationElimination Jun 06 '09 at 13:25

7 Answers7

13

Lisp grammar can not be represented as context-free grammar, and yacc can not parse all lisp code. It is because of lisp features such as read-evaluation and programmable reader. So, in order just to read an arbitrary lisp code, you need to have a full lisp running. This is not some obscure, non-used feature, but it is actually used. E.g., CL-INTERPOL, CL-SQL.

If the goal is to parse a subset of lisp, then the program text is a sequence of sexprs.

dmitry_vk
  • 4,399
  • 18
  • 21
  • 4
    Actually, that depends on the version of Lisp. If you're not targetting Common Lisp, or you're trying to bootstrap, a CFG can be a good place to start. – David Thornley Feb 05 '09 at 18:49
12

The error is really in the lexer. Your parentheses end up as the last "." in the lexer, and don't show up as parentheses in the parser.

Add rules like

\)     { return RPAREN; }
\(     { return LPAREN; }

to the lexer and change all occurences of '(', ')' to LPAREN and RPAREN respectively in the parser. (also, you need to #define LPAREN and RPAREN where you define your token list)

Note: I'm not sure about the syntax, could be the backslashes are wrong.

jpalecek
  • 47,058
  • 7
  • 102
  • 144
4

You are correct in that you need to define a non-terminal. That would be defined as a set of sexpr. I'm not sure of the YACC syntax for that. I'm partial to ANTLR for parser generators and the syntax would be:

program: sexpr*

Indicating 0 or more sexpr.

Update with YACC syntax:

program :  /* empty */
        | program sexpr
        ;

Not in YACC, but might be helpful anyway, here's a full grammar in ANTLR v3 that works for the cases you described(excludes strings in the lexer because it's not important for this example, also uses C# console output because that's what I tested it with):

program: (sexpr)*;

sexpr: list
    |  atom            {Console.WriteLine("matched sexpr");}
    ;

list:     
   '('')'              {Console.WriteLine("matched empty list");}
   | '(' members ')'   {Console.WriteLine("matched list");}

    ;

members: (sexpr)+      {Console.WriteLine("members 1");};

atom: Id               {Console.WriteLine("ID");}
    | Num              {Console.WriteLine("NUM");}
    ;


Num: ( '0' .. '9')+;
Id: ('a' .. 'z' | 'A' .. 'Z')+;
Whitespace : ( ' ' | '\r' '\n' | '\n' | '\t' ) {Skip();};

This won't work exactly as is in YACC because YACC generates and LALR parser while ANTLR is a modified recursive descent. There is a C/C++ output target for ANTLR if you wanted to go that way.

Beardo
  • 1,542
  • 1
  • 14
  • 27
2

Do you neccesarily need a yacc/bison parser? A "reads a subset of lisp syntax" reader isn't that hard to implement in C (start with a read_sexpr function, dispatch to a read_list when you see a '(', that in turn builds a list of contained sexprs until a ')' is seen; otherwise, call a read_atom that collects an atom and returns it when it can no longer read atom-constituent characters).

However, if you want to be able to read arbritary Common Lisp, you'll need to (at the worst) implement a Common Lisp, as CL can modify the reader run-time (and even switch between different read-tables run-time under program control; quite handy when you're wanting to load code written in another language or dialect of lisp).

Vatine
  • 20,782
  • 4
  • 54
  • 70
  • I'm not shooting for common lisp. My desire is that eventually I can load this onto an embedded device and write/rewrite/compose control functions on the fly. I'm using flex/bison because I know how to use them. Not because they are the best option, per se. – Paul Nathan Feb 06 '09 at 19:34
1

I just tried it, my "yacc lisp grammar" works fine :

%start exprs

exprs:
    | exprs expr
    /// if you prefer right recursion :
    /// | expr exprs
    ;

list:
    '(' exprs ')'
    ;

expr:
    atom
    | list
    ;

atom:
    IDENTIFIER
    | CONSTANT
    | NIL
    | '+'
    | '-'
    | '*'
    | '^'
    | '/'
    ;
reuns
  • 854
  • 10
  • 14
1

It's been a long time since I worked with YACC, but you do need a top-level non-terminal. Could you be more specific about "tried it" and "it didn't seem to work"? Or, for that matter, what the errors are?

I'd also suspect that YACC might be overkill for such a syntax-light language. Something simpler (like recursive descent) might work better.

David Thornley
  • 56,304
  • 9
  • 91
  • 158
  • Updated. Also, I've used the flex/bison tools before, so it makes my life easier. – Paul Nathan Feb 05 '09 at 18:38
  • I would definitely agree with the recursive descent parser for lisp - or even a push-down automaton – Eclipse Feb 05 '09 at 18:43
  • Okay, I'm stumped, it shouldn't be accepting unbalanced parentheses. I assume you're using lex/flex; how does lex/flex define ID? What you have looks good to me, so I'd like to see more of what you're doing. – David Thornley Feb 05 '09 at 18:52
1

You could try this grammar here.

Eclipse
  • 44,851
  • 20
  • 112
  • 171