2

As personal exercise for learning perl I want write one "lisp-like grammar" to javascript compiler in the perl.

"Lisp-like", so not full Lisp implementation, and I hope than the final grammar will be "context-free" grammar (LALR) and relatively easily compilable to native Javascript.

Making lexical analyzer should be no problem (probably with Parse::Flex), but need help with the selection of syntactic analyzer generators.

In the CPAN found 3 and need help with selection /read: what one I should learn :)/ for the above task.

The questions are:

  • What is the most suitable for the lisp-like languages?
  • What one have the less steep learning curve (so, exists many examples for learning) (for example I found only few Marpa examples)
Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
clt60
  • 62,119
  • 17
  • 107
  • 194
  • basic Lisp grammar is very simple. I would not overthink this one. You could write your own parser for it in perl quite easily. n.b., parse::recdescent is a good parser, but overkill here imo. – Paul Nathan May 08 '13 at 04:46
  • A year ago, you commented one my answer http://stackoverflow.com/a/6643214/632407 , and in this question only you changed Mason to "Lisp-like language". I bet you again playing with the idea to make the compiler to get an language for templates, what will compile into javascript for the latter execution with Javascript::V8. I like the idea and this is the only reason why I wrote an answer for otherwise unanswerable and very broadly scoped question. :) – clt60 May 08 '13 at 08:07
  • 1
    Here's a [list of Lisps in JavaScript](http://ceaude.twoticketsplease.de/js-lisps.html). – Joshua Taylor Jun 21 '13 at 02:25

3 Answers3

4

If you only want to parse a subset of Lisp (esp. a simple subset of Scheme), you can write that parser yourself, m//gc style and with a stack:

sub parse {
  my $_ = shift;
  pos($_) = 0;
  my @stack = ([]);
  while (pos($_) < length($_)) {
    m/\G\s+/gc and next; # skip whitespace
    if (m/\G\(/gc) { # opening parens
      push @stack, [];
    } elsif (m/\G\)/gc) { # closing parens
      my $list = pop @stack;
      push @{ $stack[-1] }, $list;
    } elsif (m/([\w-.]+)/gc) { # identifiers, numbers
      push @{ $stack[-1] }, $1;
    } else {
      die "I'm at @{[pos($_)]} and I have no idea how to parse this";
    }
  }
  @stack == 1 or die "Closing parens expected at the end";
  return $stack[0];
}

This is fairly minimal, but can parse basic Lisp. It gets more difficult when you want to allow reader macros or the quasi-quote, or strings. One should also provide better error messages.

With Marpa, the above loop wouldn't be changed much; instead of pushing, we would feed the tokens to the reckognizer.

my $grammar = Marpa::R2::Grammar->new({
  ..., # some other options here
  soure => \(<<'END_OF_GRAMMAR),
  :start ::= Atom

  List ::= (ParenL) AtomList (ParenR) action => ::first
  Atom ::= List          action => ::first
       |   Number        action => ::first
       |   Identifier    action => ::first
  AtomList ::= Atom+
END_OF_GRAMMAR
});
$grammar->precompute; # compile the grammar

This would expect the terminal symbols ParenL, ParenR, Number, Identifier.

In our parse sub, we first have to make a new recognizer

my $rec = Marpa::R2::Recognizer({ grammar => $grammar });

And modify the actions in our tokenizer loop:

my ($type, $value);
if (m/\G\(/gc) {
  ($type, $value) = (ParenL => undef);
} elsif (m/\G\)/gc) {
  ($type, $value) = (ParenR => undef);
} elsif (m/\G([0-9]+(?:\.[0-9]+))/gc) {
  ($type, $value) = (Number => $1);
} elsif (m/\G([\w-]+)/gc) {
  ($type, $value) = (Identifier => $1);
} else {
  die ...;
}
unless (defined $rec->read($type, $value) {
  die "Error at position @{[pos($_)]}. Expecting any of\n",
       map " * $_\n", @{ $rec->terminals_expected };
}

And we can extract the parse tree by

my $ref = $rec->value;
unless (defined $ref) {
  die "The input couldn't be parsed";
}
return $$ref;

In our case, the parse tree would be a bunc of nested array refs. But you can provide custom actions so that you can produce a more complex AST. E.g. blessing each node of the tree to an object, and then calling compile on the root node could be a strategy.

amon
  • 57,091
  • 2
  • 89
  • 149
4

Lisp systems themselves aren't built using parser generators. Common Lisp is parsed entirely using read tables which classify input characters into various categories. Some characters are in categories that trigger registered functions based on one or two character combinations. Some character types are gathered into tokens. The gory details are in the spec. This readtable idea came from MacLisp (See P. 11 of Evolution of Lisp).

Parser generating techniques (LALR(1) and its ilk) were invented by computer scientists like back in the 1960's, who must have felt challenged by the complicated syntax of emerging programming languages. Having moved onto the greener pastures of semantics, Lisp hackers basically rejected that, taking instead the position of not inventing an unnecessary problem (complicated context-free grammars with ambiguities) which then entail a search for a complicated solution (time and space efficient parsers, generated by machine).

So if you parse Lisp with some such thing, you're essentially committing heresy. :)

Note that something very similar to Lisp's readtables occurs in TeX: software ironically invented by the same man who invented the LR parser.

Kaz
  • 55,781
  • 9
  • 100
  • 149
2

IMHO you can use yapp - lisp has simple grammar. Check this question. You should check CPAN for "lisp" and "javascript" and you will found:

  • the pretty fresh CPAN module CljPerl - perl/lisp bridge
  • Template::Javascript
  • and maybe you should check Meta-Html (written in C) and Sibilant.js (written in Javascript and compiles into javascript) for ideas about the your future "Lisp-like" language :)
Community
  • 1
  • 1
clt60
  • 62,119
  • 17
  • 107
  • 194