If you only want to parse a subset of Lisp (esp. a simple subset of Scheme), you can write that parser yourself, m//gc
style and with a stack:
sub parse {
my $_ = shift;
pos($_) = 0;
my @stack = ([]);
while (pos($_) < length($_)) {
m/\G\s+/gc and next; # skip whitespace
if (m/\G\(/gc) { # opening parens
push @stack, [];
} elsif (m/\G\)/gc) { # closing parens
my $list = pop @stack;
push @{ $stack[-1] }, $list;
} elsif (m/([\w-.]+)/gc) { # identifiers, numbers
push @{ $stack[-1] }, $1;
} else {
die "I'm at @{[pos($_)]} and I have no idea how to parse this";
}
}
@stack == 1 or die "Closing parens expected at the end";
return $stack[0];
}
This is fairly minimal, but can parse basic Lisp. It gets more difficult when you want to allow reader macros or the quasi-quote, or strings. One should also provide better error messages.
With Marpa, the above loop wouldn't be changed much; instead of push
ing, we would feed the tokens to the reckognizer.
my $grammar = Marpa::R2::Grammar->new({
..., # some other options here
soure => \(<<'END_OF_GRAMMAR),
:start ::= Atom
List ::= (ParenL) AtomList (ParenR) action => ::first
Atom ::= List action => ::first
| Number action => ::first
| Identifier action => ::first
AtomList ::= Atom+
END_OF_GRAMMAR
});
$grammar->precompute; # compile the grammar
This would expect the terminal symbols ParenL
, ParenR
, Number
, Identifier
.
In our parse
sub, we first have to make a new recognizer
my $rec = Marpa::R2::Recognizer({ grammar => $grammar });
And modify the actions in our tokenizer loop:
my ($type, $value);
if (m/\G\(/gc) {
($type, $value) = (ParenL => undef);
} elsif (m/\G\)/gc) {
($type, $value) = (ParenR => undef);
} elsif (m/\G([0-9]+(?:\.[0-9]+))/gc) {
($type, $value) = (Number => $1);
} elsif (m/\G([\w-]+)/gc) {
($type, $value) = (Identifier => $1);
} else {
die ...;
}
unless (defined $rec->read($type, $value) {
die "Error at position @{[pos($_)]}. Expecting any of\n",
map " * $_\n", @{ $rec->terminals_expected };
}
And we can extract the parse tree by
my $ref = $rec->value;
unless (defined $ref) {
die "The input couldn't be parsed";
}
return $$ref;
In our case, the parse tree would be a bunc of nested array refs. But you can provide custom actions so that you can produce a more complex AST. E.g. blessing each node of the tree to an object, and then calling compile
on the root node could be a strategy.